RFR(M): 8189112 - AARCH64: optimize StringUTF16 compress intrinsic
dmitrij.pochepko at bell-sw.com
Tue May 8 13:26:43 UTC 2018
please review patch for 8189112 - AARCH64: optimize StringUTF16 compress
This patch is based on 3 improvement ideas:
- introduction of additional large loop with prefetch instruction for
- different compression implementation, using uzp1 and uzp2 instructions
instead of uqxtn and uqxtn2, which are more expensive. It also allows to
drop direct FPSR register operations, which are very slow on some CPUs.
- slightly another codeshape, which mostly executes branches and
independent operations while loads and stores are used (helps "in-order"
benchmarks: I created JMH benchmark with direct call via reflection:
Tested CPUs: ThunderX, ThunderX2, Cortex A73.
Performance results summary:
ThunderX: 3-5% improvement on small strings on average, x1.65 (40%) on
ThunderX2: same results on strings with length <8, up to x1.65 (40%) for
size 8..64, about x4 (80%) improvement for large strings
Cortex A73: up to 8% on small strings, up to x1.65 (40%) on large strings
Detailed results table can be found here:
- hotspot jtreg tests using release build: ./compiler/*, ./gc/* and
- hotspot jtreg tests using fastdebug build: ./compiler/*
No new failures found
More information about the hotspot-compiler-dev