RFR :7088419 : (L) Use x86 Hardware CRC32 Instruction with java.util.zip.CRC32 and java.util.zip.Adler32
david.r.chase at oracle.com
Fri May 17 06:45:00 PDT 2013
On 2013-05-17, at 4:28 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:
> On May 17, 2013, at 12:09 AM, David Chase <david.r.chase at oracle.com> wrote:
>> On 2013-05-16, at 5:27 PM, Alan Bateman <Alan.Bateman at oracle.com> wrote:
>>> The current proposal doesn't change the API at this time but I wonder if you have considered adding parallelUpdate methods to complement the serial methods?
>> I'm not sure I understand.
> Alan may be suggesting that the caller should explicitly, but unobtrusively, opt in to parallel execution rather than it being something implicit.
Right, but I was responding to this in the context of a compiler bug -- they just wanted CRC32 and Adler32 to go faster, and were not asking for a new interface. My intent/assumption was that if I used the System forkjoin pool for this, that if there were a problem with what I might call "obnoxious parallelism", that the person running the application would turn down the parallelism in the system fork-join pool. If we use forkjoin to make other things go faster (e.g., BigInteger arithmetic and/or formatting) they'll have exactly the same problem, and either they'll have to independently try to deal with it there (in the same way that I worry about whether the CRC/Adler code should be self-throttling) or assume that it will be solved by the same knob being turned (by whoever is tuning the whole application).
Baking it in as a static decision is just kicking the can down the road -- once some module that computes CRCs as part of its larger work chooses serial or parallel, what then, if that module's client doesn't like that decision?
Another option is to keep an interface, but add a system property enabling or disabling it. I'm not sure that's the right approach; there is a sun. property that I use to check the tuning, but as soon as FJ is widely used (and perhaps statically baked in to some larger modules, even if we adopt the ask-for-parallel method idiom), those selective-disable properties will multiply like bunnies, and I don't think that's a good idea (documentation, testing, arrgh).
> That is the approach we have taken for the Streams API, parallel sort and parallel prefix of arrays in JDK 8 [*]. It's a least surprise principle of libraries "not eating more compute resources unless you ask it to". I think Adler32 and CRC32 should stick to the same principle and require opt in from the caller, thus implying this behaviour cannot be part of the public API in JDK 7.
> [*] In JDK 8 the Arrays.parallelSort methods use an array length threshold of 2^13, arrays <= to that length will be sorted sequentially and arrays > than will be sorted in parallel. IIUC the threshold was chosen to ensure that Arrays.parallelSort is ~ the same as or faster than Arrays.sort.
I picked a threshold for 512K for slow CRC and 1M for Adler and fast CRC, in a conservative approximation of the same reasoning.
More information about the hotspot-compiler-dev