8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random)
ecki at zusammenkunft.net
Thu Dec 20 12:58:05 UTC 2018
Hm strange, never saw it this way. Would other types have the same problem (should be visible in your histogram for long as well, right?)
Von: core-libs-dev <core-libs-dev-bounces at openjdk.java.net> im Auftrag von Brian Burkhalter <brian.burkhalter at oracle.com>
Gesendet: Donnerstag, Dezember 20, 2018 8:03 AM
Betreff: 8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random)
This issue was filed to cover improving the uniformity of randomly generated BigIntegers. It is not intended to resolve  which is deliberately left open. The proposed patch implements a modified version of the “workaround” suggested in .
The problem is that the magnitude of the random BigInteger is created from a sequence of bytes generated by Random.nextBytes() . The likelihood that any of these bytes is zero is small so the distribution of the resulting random BigIntegers is skewed towards values close to the maximum bit size “numBits” specified to the constructor.
The workaround suggested in  is to randomly change numBits to the value numBits = Random.nextInt(numBits + 1) . (Actually the suggested workaround is nextInt(numBits) which is incorrect as the parameter is an exclusive upper bound.) This greatly improves the uniformity of the distribution. A remaining problem however is that now the very largest numbers in the interval [0,2^numBits) are underrepresented. A modification of this approach is to increment the new value of numBits as numBits = Random.nextInt(numBits + 1) + 1 . This was empirically observed to improve the underrepresentation of the largest values.
The distribution of the random BigIntegers was estimated using . For a given maximum bit length, bin size, and number of random values to generate, this creates a histogram and calculates the coefficient of variation of the numbers generated. The histogram bin at index zero represents the largest valued bin in the result. The count in a given histogram bin is the number of values for which that bin is the leftmost (largest valued) with at least one non-zero bit. The bin of maximum index represents zero.
Results for the current and two modified approaches for 256 bits with a 1-bit bin size and for 4096 bits with a 4-bit bin size are given at [6-11]. As may be observed, the original histogram is clustered towards the largest possible value 2^numBits - 1, and the coefficient of variation is small. The results for the two variants of the patch show a flattened distribution, i.e., more uniform, and a significantly larger coefficient of variation. The second approach shows better flattening of both ends of the histogram. These results are samples only but are exemplary of the results observed over numerous runs of this code.
The test ModPow is modified as the modPow() method throws an ArithmeticException for a zero modulus. The current algorithm never generates a random BigInteger equal to zero however so that exception never occurs. That is not the case for either modified version.
More information about the core-libs-dev