RFR (S): JDK-8191328: Avoid unnecessary overhead in CRC32C
dmitry.chuyko at bell-sw.com
Thu Nov 16 16:42:01 UTC 2017
On 11/15/2017 09:44 PM, Andrew Haley wrote:
> On 15/11/17 18:38, Vitaly Davidovich wrote:
>> On Wed, Nov 15, 2017 at 12:40 PM, Andrew Haley <aph at redhat.com> wrote:
>>> On 15/11/17 15:38, Alan Bateman wrote:
>>>> Moving the nativeOrder out of the loop make sense but I'm curious about
>>>> the context for improving this implementation.
>>> I wonder about lifting ByteOrder.nativeOrder(). Maybe it fails to
>>> inline because the method is too large: if that happens, we really
>>> lose. I'm not seeing that, though: it seems to be inlined just fine,
>>> and has no effect.
Sure, it is the effect of missing inlining. But you can relatively
easily break it by your tiered JIT settings. Not sure about AOT. Like
-XX:-Inline, -XX:MaxInlineLevel=0 (no wonder to meet this one in wild),
>>> In any case, this patch doesn't help anything on my test hardware.
>> Is this with -Xcomp though? That can generate crap code because
>> there's no profiling information. Not that -Xcomp should be the way
>> to test peak performance IMO, but that is the setting that was used I
Another noticeable case is -Xint where absolute times of CRC calculation
are quite long.
Here is a benchmark that is easier to experiment with (no need to build
jdk or to turn off intrinsics):
Some x86 results:
before 380.957 ± 11.621 ns/op
after 350.838 ± 5.149 ns/op
before 656.791 ± 8.216 ns/op
after 340.999 ± 2.686 ns/op
before 36113.441 ± 197.716 ns/op
after 26928.593 ± 133.309 ns/op
> Shrug; maybe. We shouldn't mess the code up for -Xcomp.
More information about the core-libs-dev