RFR :7088419 : (L) Use x86 Hardware CRC32 Instruction with java.util.zip.CRC32 and java.util.zip.Adler32
david.r.chase at oracle.com
Fri May 31 07:42:43 PDT 2013
On 2013-05-31, at 9:52 AM, Alan Bateman <Alan.Bateman at oracle.com> wrote:
> On 31/05/2013 03:17, David Chase wrote:
>> Not sure where this stands, given that Vladimir K has a grand plan to turn the assembly language into an intrinsic (this is not how I would normally approach it) but there is another webrev with the unnecessary import removed:
> I guess this comes down to timing and which releases the improvement is required in. If the intrinsic that Vladimir is proposing is more longer term, and you are looking to get this is so that it can also go into jdk7u, then it might be okay to just push your implementation now (as the speed-up is compelling). As I mentioned in one of the mails, then the passing through of the property to enable it is a bit awkward but you explained that detecting the processor feature is complicated and the duplication is best avoid. Also the XX option although if this were compiled C then the XX option wouldn't have any effect.
The current plumbing for the XX option does affect code flow in the compiled C; that was pretty much the entire point of it. Vladimir has a plan, I think his attitude is that if convert this into an intrinsic then we won't care about compiler version or updating the code to get rid of the assembler, it will just be done, for good. My POV is that for now it's done, and we don't have to touch it for a good long while (except that I left Windows unoptimized).
> Some minor nits on CRC32 in the latest webrev:
> - as "javaCRCIfSmallThan" is a constant then it can be final and probably renamed to uppercase to be consistent. Looks like timesXtoThe32 can be final too.
> - the static initializer is use 2-space indent whereas we usually use 4
I will get those. I was unsure on whether these things would always be final; if we were willing to admit the existence of platform-dependent performance differences, then they would be different on different platforms (JNI seems to be somewhat more expensive on Sparc, if I recall correctly) and it also differs a little based on use of the "fast crc" path, but it is not a huge difference in either case.
> Otherwise I don't have any other issues.
> Are you still planning to adding the parallel version?
I would like to do a parallel version, but I would like to get this in to 8 and maybe 7, and it seems to me that leaving out the parallel improvements makes that more likely. I think it is wrong-headed to declare that use of parallelism behind the curtain constitutes an interface change (especially in cases like this where the input/output semantics are exactly specified and followed), but other people seem to think otherwise, and hashing that out will just take time. It would be nice to also deliver some performance boost to Sparc boxes, and the parallel version does this quite nicely. I suppose I should file a bug/RFE for this, thought it would also be interesting to go looking for other parts of the JDK that might be helped by parallelism (as was mentioned in another review, how about those Big Integers?)
Another thing, that I think is more pie-in-the-sky, is that we need better interfaces for measurement. We *say* we are worried about power consumption, but do you see any probes for measuring it? When I see slow startup of fork-join, I can't even tell if what I am seeing is some sort of incredibly expensive initialization (consumes power) or a slumbering processor waking up to work (extra time needed does not reflect power consumption -- in fact it reflects a LACK of power consumption).
More information about the hotspot-compiler-dev