RFR 8203628: Optimize (masked) byte memory comparisons on x86
shade at redhat.com
Tue May 22 15:10:54 UTC 2018
We have found this during Shenandoah performance work. There, the GC barrier fastpath checks the
thread-local byte field to see if barriers are enabled. It follows different shapes, e.g.:
Set cr (CmpI (LoadB mem) imm))
Set cr (CmpI (LoadUB mem) imm))
Set cr (CmpI (AndI (LoadUB mem) imm) zero))
The closest code shape in current JDK is G1 SATB pre-barriers. This is how G1 SATB fastpath looked
before this change:
0x00: movsbl 0x30(%r15),%r11d ; is SATB active?
0x05: test %r11d,%r11d
0x08: jne 0x00007f22d889834f
0x0e: mov 0x10(%r14),%r9d ; actual store
0x00: cmpb $0x0,0x30(%r15) ; is SATB active?
0x05: jne 0x00007fb57f7f6baf
0x0b: mov 0x10(%r14),%r9d ; actual store
So, currently it gets compiled using the temporary register, when we could use the indirect memory
operands, thus avoiding the temp. It gets worse when mask is involved, and we would have another
addl on this path. The optimization works well when GC barriers are frequent and/or the register
pressure is high, because we have both denser code and more available registers. When registers are
abundant and CPU
The impact on SPECjvm running with G1 is barely over the margin of error on the positive side, which
seems to align with the idea that SATB barriers are not very frequent. It is different for
Shenandoah that has much more barriers though.
Testing: Linux x86_64 tier1, SPECjvm and targeted benchmarks, jdk-submit (running), the variant of
this improvement was tested in Shenandoah for about 5 months now 
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 819 bytes
Desc: OpenPGP digital signature
More information about the hotspot-compiler-dev