RFR: 8189176 - AARCH64: Improve _updateBytesCRC32 intrinsic
Derek.White at cavium.com
Fri Oct 27 22:31:30 UTC 2017
The code looks good.
I have one suggestion for MacroAssembler::kernel_crc32(). It's a matter of taste, so it really is just a suggestion:
- The use of temp registers in the UseCRC32 case is kind of muddled, using tmp, and table0..table3 as temp registers, and the name "table" is confusing in this case.
- Maybe it would be cleaner to refactor the UseCRC32 code into a separate kernel_crc32_using_crc32() subroutine (static or macro?). This would accept the main args and 4 registers for temps. The caller can supply some combination of table or tmp registers.
- This would shrink the size of kernel_crc32() by a lot too.
- The next person to touch the UseNeon code could factor that out as well
This obviously would apply to kernel_crc32c as well.
> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-
> bounces at openjdk.java.net] On Behalf Of Dmitry Chuyko
> Sent: Wednesday, October 11, 2017 12:31 PM
> To: hotspot-compiler-dev at openjdk.java.net
> Subject:  RFR: 8189176 - AARCH64: Improve _updateBytesCRC32 intrinsic
> Please review an improvement of CRC32 calculation on AArch64.
> MacroAssembler::kernel_crc32 gets table registers that are not used on
> -XX:+UseCRC32 path. They can be used to make neighbor loads and CRC
> calculations independent. Adding prologue and epilogue for main by-64 loop
> makes it applicable starting from len=128 so additional by-32 loop is added
> for smaller lengths.
> rfe: https://bugs.openjdk.java.net/browse/JDK-8189176
> webrev: http://cr.openjdk.java.net/~dchuyko/8189176/webrev.00/
> Results for T88 and A53 are good, but splitting pair loads may slow down
> other CPUs so measurements on different HW are highly welcome.
More information about the hotspot-compiler-dev