RFR :7088419 : (L) Use x86 Hardware CRC32 Instruction with java.util.zip.CRC32 and java.util.zip.Adler32
vladimir.kozlov at oracle.com
Fri May 31 10:43:28 PDT 2013
On 5/31/13 7:42 AM, David Chase wrote:
> On 2013-05-31, at 9:52 AM, Alan Bateman <Alan.Bateman at oracle.com> wrote:
>> On 31/05/2013 03:17, David Chase wrote:
>>> Not sure where this stands, given that Vladimir K has a grand plan to turn the assembly language into an intrinsic (this is not how I would normally approach it) but there is another webrev with the unnecessary import removed:
>> I guess this comes down to timing and which releases the improvement is required in. If the intrinsic that Vladimir is proposing is more longer term, and you are looking to get this is so that it can also go into jdk7u, then it might be okay to just push your implementation now (as the speed-up is compelling). As I mentioned in one of the mails, then the passing through of the property to enable it is a bit awkward but you explained that detecting the processor feature is complicated and the duplication is best avoid. Also the XX option although if this were compiled C then the XX option wouldn't have any effect.
> The current plumbing for the XX option does affect code flow in the compiled C; that was pretty much the entire point of it. Vladimir has a plan, I think his attitude is that if convert this into an intrinsic then we won't care about compiler version or updating the code to get rid of the assembler, it will just be done, for good. My POV is that for now it's done, and we don't have to touch it for a good long while (except that I left Windows unoptimized).
My biggest concern about the assembler code in jdk is its support. Even
if you document how you got ".byte 0xc4,0xe3", next person will have
very hard time to fix/modify this code. From my point of view it is dead
About intrinsic. It will work on all OSs (including Windows) and don't
need special versions of gcc, c++. But, it looks like, I need an other
week to finish it. And I am worried about startup performance of
intrinsic vs asm in jdk.
>> Some minor nits on CRC32 in the latest webrev:
>> - as "javaCRCIfSmallThan" is a constant then it can be final and probably renamed to uppercase to be consistent. Looks like timesXtoThe32 can be final too.
>> - the static initializer is use 2-space indent whereas we usually use 4
> I will get those. I was unsure on whether these things would always be final; if we were willing to admit the existence of platform-dependent performance differences, then they would be different on different platforms (JNI seems to be somewhat more expensive on Sparc, if I recall correctly) and it also differs a little based on use of the "fast crc" path, but it is not a huge difference in either case.
>> Otherwise I don't have any other issues.
>> Are you still planning to adding the parallel version?
> I would like to do a parallel version, but I would like to get this in to 8 and maybe 7, and it seems to me that leaving out the parallel improvements makes that more likely. I think it is wrong-headed to declare that use of parallelism behind the curtain constitutes an interface change (especially in cases like this where the input/output semantics are exactly specified and followed), but other people seem to think otherwise, and hashing that out will just take time. It would be nice to also deliver some performance boost to Sparc boxes, and the parallel version does this quite nicely. I suppose I should file a bug/RFE for this, thought it would also be interesting to go looking for other parts of the JDK that might be helped by parallelism (as was mentioned in another review, how about those Big Integers?)
> Another thing, that I think is more pie-in-the-sky, is that we need better interfaces for measurement. We *say* we are worried about power consumption, but do you see any probes for measuring it? When I see slow startup of fork-join, I can't even tell if what I am seeing is some sort of incredibly expensive initialization (consumes power) or a slumbering processor waking up to work (extra time needed does not reflect power consumption -- in fact it reflects a LACK of power consumption).
More information about the hotspot-compiler-dev