RFR(S): 8216259: AArch64: Vectorize Adler32 intrinsics
Pengfei Li (Arm Technology China)
Pengfei.Li at arm.com
Mon Jan 21 10:53:47 UTC 2019
This is a vectorization optimization of AArch64 intrinsic code of Adler-32 checksum. An Adler-32 checksum is obtained by calculating two 16-bit checksums s1 and s2, and then concatenating their bits into a 32-bit integer. Details of the algorithm could be found on Wikipedia at https://en.wikipedia.org/wiki/Adler-32 .
In previous Adler-32 intrinsic code written by Edward Nevill, we accumulate the lower and upper halves of the checksum value, s1 and s2, for every 16 bytes in the nmax_loop and by16_loop. In this patch, these accumulation operations are vectorized with NEON instructions in these 2 loops.
I tested the correctness of my patch by comparing the checksum results of 5000 byte arrays of 1MB size. Test code and script can be found at .
I also tested the performance with and without my patch by a JMH case . The JMH result shows that the performance gets ~2.5x optimized by this.
More information about the hotspot-compiler-dev