RFR(L) 8073583: C2 support for CRC32C on SPARC
vladimir.kozlov at oracle.com
Thu Jun 25 23:55:19 UTC 2015
Here is latest webrev which was already reviewed by me and John. We
think it is good to be integrated.
Contributed by: James Cheng
On 5/4/15 8:49 PM, John Rose wrote:
> On May 4, 2015, at 8:04 PM, James Cheng <james.cheng at oracle.com> wrote:
>> Hi John,
>>> On May 4, 2015, at 6:21 PM, John Rose <john.r.rose at oracle.com> wrote:
>>> One more comment, which is at a higher level: Could we recode the loop control in Java and use unsafe to handle word and byte loads? Then we would only need single instruction intrinsics.
>> We could, I guess, but that means we’d need to rewrite the pure Java CRC32C in JDK.
>> More difficult is how we implement the CRC32C methods so that they are not favoring
>> one platform while hindering others. I am afraid that the CRC32C instructions on different
>> platforms are too different to compromise.
> That may be; the vector size is CPU-dependent, for example. But a 64-bit vector is (currently) the sweet spot for writing vectorized code in Java, since 'long' is the biggest bit container in Java. (Note also that HotSpot JVM objects are aligned up to 8 byte boundaries, even after GC.) Another platform with larger vectors would have to use assembly language anyway (which Intel does), but Java code can express 64-bit vectorized loops.
> For CRC, the desirable number of distinct streams, and the prefetch mode and distance, are also CPU-dependent. For those variations injecting machine-specific parts into the Java-coded algorithm would get messy.
> The benefit of coding low-level vectorized loops in Java would be not having to code the loop logic in assembly code. If we could use byte buffers to manage the indexing, and/or had better array notations, it would probably be worth while moving from assembly to Java. At present it seems OK to code in assembly, *if* the assembly can be made more readable.
> We have a chicken and egg problem here: Nobody is going to experiment with Java-coded vector loops until we get single-vector CRC32[C] and XMULX instructions surfaced as C2-supported intrinsics. (We already have bit and byte reverse intrinsics, so that part is OK.)
> — John
More information about the hotspot-compiler-dev