RFR(XL): 8224675: Late GC barrier insertion for ZGC

Roman Kennke rkennke at redhat.com
Fri May 24 09:42:09 UTC 2019

Hi Nils,

> I removed the method. I have verified that Shenandoah builds and isn't
> obviously broken.
> Webrev updated in place.

Thanks! Will review more thoroughly and run some tests later.

Nice approach, btw!


> Regards,
> Nils
> On 2019-05-23 16:31, Roman Kennke wrote:
>> Quick glance showed a problem: you are renaming/moving
>> BarrierSetC2::add_users_to_worklist() but ShenandoahBarrierSetC2 is not
>> updated accordingly.
>> Roman
>>> Hi,
>>> In ZGC we use load barriers on references. In the original
>>> implementation these where added as macro nodes at parse time. The load
>>> barrier node consumes and produces control flow in order to be able to
>>> be lowered into a check with a slow path late. The load barrier nodes
>>> are fixed in the control flow, and extensions to different optimizations
>>> are need the barriers out of loop and past other unrelated control flow.
>>> With this patch the barriers are instead added after the loop
>>> optimizations, before macro node expansion. This makes the entire
>>> pipeline until that point oblivious about the barriers. A dump of the IR
>>> with ZGC or EpsilonGC will be basically identical at that point, and the
>>> diff compared to serialGC or ParallelGC that use write barriers is
>>> really small.
>>> Benefits
>>> - A major complexity reduction. One can reason about and implement loop
>>> optimization without caring about the barriers. The escape analysis
>>> doesn't need to know about the barriers. Loads float freely like they
>>> are supposed to.
>>> - Less nodes early. The inlining will become more deterministic. A
>>> barrier heavy GC will not run into node limits earlier. Also node limit
>>> bounded optimization like unrolling and peeling will not be penalized by
>>> barriers.
>>> - Better test coverage, or reduce testing cost when the same
>>> optimization doesn't need to be verified with every GC.
>>> - Better control on where barriers end up. It is trivial to guarantee
>>> that the load and barriers are not separated by a safepoint.
>>> Design
>>> The implementation uses an extra phase that piggy back on PhaseIdealLoop
>>> which provides control and dominator information for all loads. This
>>> extra phase is needed because we need to splice the control flow when
>>> adding the load barriers.
>>> Barriers are inserted on the loads nodes in post order (any successor
>>> first). This is to guarantee the dominator information above every
>>> insertion is correct. This is also important within blocks. Two loads in
>>> the same block can float in relation to each other. The addition of
>>> barriers serializes their order. Any def-use relationship is upheld by
>>> expanding them post order.
>>> Barrier insertion is done in stages. In this first stage a single macro
>>> node that represents the barrier is added with all dependencies that is
>>> required. In the macro expansion phase the barrier nodes is expanded
>>> into the final shape, adding nodes that represent the conditional load
>>> barrier check. (Write barriers in other GCs could possibly be expanded
>>> here directly)
>>> All the barriers that are needed for unsafe reference operations (cas,
>>> swap, cmpx) are also expanded late. They already have control flow, so
>>> the expansion is straight forward.
>>> The barriers for the unsafe reference operations (cas, getandset, cmpx)
>>> have also been simplified. The cas-load-cas dance have been replaced by
>>> a pre-load. The pre-load is a load with a barrier, that is kept alive by
>>> an extra (required) edge on the unsafe-primitive-nodes (specialized as
>>> ZCompareAndSwap, ZGetAndSet, ZCompareAndExchange).
>>> One challenge that was encountered early and that have caused
>>> considerable work is that nodes (like loads) can end up between calls
>>> and their catch projections. This is usually handled after matching, in
>>> PhaseCFG::call_catch_cleanup, where the nodes after the call are cloned
>>> to all catch blocks. At this stage they are in an ordered list, so that
>>> is a straight forward process. For late barrier insertion we need to
>>> splice in control earlier, before matching, and control flow between
>>> calls and catches is not allowed. This requires us to add a
>>> transformation pass where all loads and their dependent instructions are
>>> cloned out to the catch blocks before we can start splicing in control
>>> flow. This transformation doesn't replace the legacy call_catch_cleanup
>>> fully, but it could be a future goal.
>>> In the original barrier implementation there where two different load
>>> barrier implementations: the basic and the optimized. With the new
>>> approach to barriers on unsafe, the basic is no longer required and has
>>> been removed. (It provided options for skipping the self healing, and
>>> passed the ref in a register, guaranteeing that the oop wasn't
>>> reloaded.)
>>> The wart that was fixup_partial_loads in zHeap has also been made
>>> redundant.
>>> Dominating barriers are no longer removed on weak loads. Weak barriers
>>> doesn't guarantee self-healing.
>>> Follow up work:
>>> - Consolidate all uses of GrowableArray::insert_sorted to use the new
>>> version
>>> - Refactor the phases. There are a lot of simplifications and
>>> verification that can be done with more well defined phases.
>>> - Simplify the remaining barrier optimizations. There might still be
>>> code paths that are no longer needed.
>>> Testing:
>>> Hotspot tier 1-6, CTW, jcstress, micros, runthese, kitchensink, and then
>>> some. All with -XX:+ZVerifyViews.
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8224675
>>> Webrev: http://cr.openjdk.java.net/~neliasso/8224675/webrev.01/
>>> Please review,
>>> Regards,
>>> Nils

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20190524/77ebac0c/signature.asc>

More information about the hotspot-compiler-dev mailing list