RFR :7088419 : (L) Use x86 Hardware CRC32 Instruction with java.util.zip.CRC32 and java.util.zip.Adler32
david.r.chase at oracle.com
Fri May 17 14:18:58 UTC 2013
On 2013-05-17, at 9:55 AM, Doug Lea <dl at cs.oswego.edu> wrote:
> A quick note while travelling...
>>> [*] In JDK 8 the Arrays.parallelSort methods use an array length
>>> threshold of 2^13, arrays <= to that length will be sorted sequentially
>>> and arrays > than will be sorted in parallel. IIUC the threshold was
>>> chosen to ensure that Arrays.parallelSort is ~ the same as or faster
>>> than Arrays.sort.
>> I picked a threshold for 512K for slow CRC and 1M for Adler and fast CRC,
>> in a conservative approximation of the same reasoning.
> The 8K threshold in sort is in part to avoid memory contention
> across threads, especially for Object/Reference sort, but also
> overhead vs throughput tradeoffs. For CRC, my offhand guess is
> that something closer to 64K would be close to optimal on
> most machines. Perhaps Aleksey and friends could supply a
> threshold vs throughput plot on a few machines?
I would hold off a bit on Aleksey's time; following a tip from David Holmes, I added a call to tryUnfork, and it nicely improved both performance and predictability (why is this not the default? Think of the world of naive users out there, ready to shoot themselves in the foot with whatever new toy we hand them.) My plan is to set a bunch of my crappy benchmarks running on a diversity of machines, and concurrently try to figure out how to use JMH.
My informally eyeballed threshold, based on extremely preliminary benchmarking, would be 4x or 8x smaller -- so 128 or 256k for fast CRC and Adler, 64k or 128k for slow CRC (fast CRC is about 2.5x faster than slow CRC). Tend higher on Intel, smaller on Sparc, but I'd prefer not to bake that much platform sensitivity into the code. Note that my informal crappy benchmarking does have one positive attribute -- it takes place on a laptop that is doing lots of other stuff, so though the results are a little noisy, they also reflect exactly the sort of contention for resources that has us a little nervous about willy-nilly use of fork join. So far, so good.
More information about the core-libs-dev