Redundant barrier elimination
dl at cs.oswego.edu
Wed Feb 12 10:44:53 PST 2014
On 02/12/2014 11:20 AM, Lindenmaier, Goetz wrote:
> during the PPC port, we encountered some problems with the current
> representation of the barriers.
> 1.) We can do load-acquire (ld-twi-isync) on ppc. Therefore we implement
> MemBarAcquire empty.
> But there were places where MemBarAcquire is issued without
> corresponding to a dedicated load. To distinguish this, we introduced
> Further, there are graphs where a ld.acq is followed by a membar instruction
> (sync or lwsync), in this case we can omit the -twi-isync. We check this
> during matching by calling followed_by_acquire() in the matcher predicate.
> (Comparable to Matcher::post_store_load_barrier().)
First a disclaimer: I have amateur status in C2, so could
easily be wrong.
In principle, even on processors with fused fence+access,
the later you go without fusing them, the more likely you
can get rid of the fences. So it would seem better all around to
have smarter matching only during instruction generation, and
keep the fences separate. This also simplifies other C2 passes
because the fences also generate reordering constraints. Keeping
the four kinds of fences separate also aids optimization.
Pushing this a little harder, it may also make sense to do
this for CompareAndSwap* nodes and maybe even *Lock nodes
rather than implicitly assuming fence properties.
Especially since some processors have different modes
(acquire/release/full), or LL/SC idioms
with these effects. It is possible/likely that JMM revisions and
related JEPs will expose these.
> 4.) We think that in do_exits() a MemBarStoreStore suffices.
(Yes, it is likely that this will become the officially
sanctioned strategy across all processors.)
More information about the hotspot-compiler-dev