RFR: Match barrier fastpath checks better

Aleksey Shipilev shade at redhat.com
Tue Jan 9 15:28:52 UTC 2018

(Roland made the draft revision of this patch last year)

Current barrier fastpath checks the flags like this:

   0x0: movzbl 0x3d8(%r15),%r10d ; check evac-in-progress
  +0x8: test   %r10d,%r10d
  +0xB: jne    SLOW-PATH
 +0x11: ...

This wastes the register %reg, which is bad when barriers are back-to-back and register pressure is
high. The fix trivially folds the checks against memory with byte-sized immediates with cmpb, so the
resulting code is register-less and shorter:

   0x0: cmpb   $0x0,0x3d8(%r15)
  +0x8: jne    SLOW-PATH
  +0xE: ...

This follows similar .ad patterns that fold particular cmp shapes, and the fix would be upstreamed
separately. We would like to have this in Shenandoah repos for more thorough testing. "Unsigned"
shape covers Shenandoah WB checks, and "signed" covers SATB checks. (Amusingly, this affects C2, but
not C1, which generates cmpb for cases like these.) We actually need only tests against zero-es, but
there is nothing that prevents us to check for the entire range of bytes.

Regular benchmarks are affected very little, with some tiny improvements -- because barriers there
are already well-optimized. But in cases where barriers are not optimized(-able), the improvement is
substantial. For example, in recent SPSCQueue benchmarks [1], the score improved around +50%.

Testing: hotspot_gc_shenandoah {fastdebug|release}, specjvm


[1] http://cr.openjdk.java.net/~shade/shenandoah/jctools-QueueThroughputBackoffNone.txt

More information about the shenandoah-dev mailing list