BigInteger Burnikel-Ziegler division algorithm crossover heuristic
Brian Burkhalter
brian.burkhalter at oracle.com
Fri Nov 22 21:47:48 UTC 2013
Hello,
We've been doing fairly extensive micro-benchmark testing of the effect of adjusting the various algorithm crossover thresholds in BigInteger (cf. https://bugs.openjdk.java.net/browse/JDK-8022181). There'll be another post on that particular topic in the near future. This testing did however expose another topic for which an issue is not yet (but likely will be) on file, i.e., the heuristic for determining when to use Burnikel-Ziegler divide and conquer division instead of "standard" division.
At present the test used is
if divisor_int_length < BZ_THRESHOLD or dividend_int_length < BZ_THRESHOLD then
use standard division
else
use B-Z division
(where BZ_THRESHOLD is currently 50).
Let {N,D} represent a dividend-divisor pair with the int-length of the former being N and that of the latter being D. The following cases were among those initially tested: {T,2T}, {T,T+2}, {T,T}, {T+2,T}, and {2T,T} where T is the BZ threshold. In all but the last case, where the dividend is twice the length of the divisor, a (sometimes serious) performance regression was observed.
The implementation was reviewed along with the original paper [1]. The implementation is accurate with respect to the paper as had already been verified some months ago [2]. It was observed however that something interesting happens for the case where the int-length of the dividend A is equal to that of the divisor B. For this situation, the algorithm parameter t (step 5 in algorithm 3 in the paper) becomes 2. This causes A to be virtually padded out to a 2T-length value [0,A]. The net result is that the BZ algorithm ends up being applied in effect to a pair {2T,T} which is significantly slower than using the standard algorithm on the pair {T,T}. This appears to be primarily due to the algorithm overhead rather than a length dependent problem. For the {T,T} case, the following recursive call sequence of significant operations ensues:
divide
divideBZ
divide2n1n
divide3n2n
divide2n1n
divideKnuth
multiply
subtract
divide3n2n
divide2n1n
divideKnuth
multiply
subtract
add
Despite the fact that some of the method calls reduce to simple operations like dividing zero by non-zero, dividing something by itself, or multiplying something by zero or one, the algorithm overhead causes the performance to be worse than had this sequence been used:
divide
divideKnuth
Through micro-benchmark testing it was discovered that, given the original BZ_THRESHOLD criterion is met, for some value C the BZ algorithm will start to outperform the standard algorithm for the case {T+C,T}, i.e., the dividend is C ints longer than the divisor. Testing supports the value of the offset C being independent of the value of the BZ threshold. This suggests that the following heuristic would be more appropriate for this algorithm:
if divisor_int_length < BZ_THRESHOLD or dividend_int_length - divisor_int_length < BZ_OFFSET_THRESHOLD then
use standard division
else
use B-Z division
Any comments on this alternative heuristic would be appreciated as I would not be in the least bit surprised to have missed some fine point altogether.
Thanks,
Brian
[1] Christoph Burnikel and Joachim Ziegler, "Fast Recursive Division", Max-Planck-Institut fuer Informatik Research Report MPI-I-98-1-022, http://www.mpi-sb.mpg.de/~ziegler/TechRep.ps.gz.
[2] http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-July/018922.html
More information about the core-libs-dev
mailing list