RFR: AArch64 -- patch to make volatile reads and stores use ldar/stlr
vladimir.kozlov at oracle.com
Wed Apr 22 22:59:02 UTC 2015
First, I am fine with your changes. I thought that may be adding some
shared code (flag?) will simplify (reduce) your change. But based on
your answer it is not the case. So I am fine with you changes.
On 4/22/15 2:00 AM, Andrew Dinn wrote:
> Hi Vladimir,
> Thanks for responding.
> On 21/04/15 18:03, Vladimir Kozlov wrote:
>> Would it help if we add a new flag to NodeFlags to indicate volatile
>> memory node?
> Yes, I think this would provide some benefit although I don't know if it
> will justify cost of implementing and testing the necessary shared code
> changes given that the benefit is only partial. Even with such a flag we
> would still need to do some of the work that this patch does. Also, the
> flag would not help with a the next optimization we are planning
> (removing unnecessary barriers from CAS sequences).
> Your suggestion will /simplify/ the predicates for LoadX/StoreX but will
> not make much difference to execution time. The current predicates for
> these nodes, needs_acquiring_load() and needs_releasing_store(), check
> immediately whether a LoadX is acquiring or a StoreX is releasing. So,
> we have a quick filter for the /common path/ case where we simply
> generate an ldr<x> or stlr<x> (I assume volatile ops are relatively rare).
> However, we cannot rely solely on these existing flags as they are
> sometimes set in other (non-volatile) cases. Currently, if the acquiring
> or releasing flag is set the predicates go on to investigate the nodes
> around the LoadX/StoreX looking for a specific configuration of
> MemBar_XXX nodes. Adding the volatile flag would remove the need to do
> this extra work in the /uncommon/ case.
> Unfortunately, the other half of the patch involves eliding generation
> of dmb instructions for the MemBar_XXX nodes associated with the a LoadX
> or StoreX. Three other predicates are used for this purpose,
> unnecesary_membar_acquire, unnecesary_membar_release and
> unnecesary_membar_volatile. They search the ideal graph around the
> membar to see whether it belongs to a signature sequence containing an
> acquiring load or releasing store.
> The check involves i) finding a related LoadX or StoreX, ii) checking it
> is acquiring or releasing and ii) ensuring it is either tailed or
> bracketed by a node group with a 'signature' shape which includes the
> MemBar we started from. A volatile flag would not really speed up this
> check and would not do much to simplify it.
> The current predicates test for very specific signature configurations.
> So, in cases where the MemBar node is /not/ part of a signature volatile
> sequence they detect this very rapidly. In the common case the predicate
> immediately fails to find the required sort of parent/child node in the
> tree surrounding the initial MemBar (i.e. one which is not a
> LoadX/StoreX, Proj or MemBar). If we do arrive at a LoadX/StoreX in the
> node tree we could, perhaps, check the volatile flag and short-circuit
> any further graph traversal but the bulk of the work involves traversing
> from the initial MemBar to the LoadX/StoreX.
> n.b. I say perhaps because I am not yet certain that we just test the
> volatile flag. I don't know for sure that we could not arrive at the
> memory node from a MemBar which was not part of the signature sequence
> (I /am/ very confident that the current checks correctly identify
> volatile node sequences and only those sequences).
> Could we punt this question of providing the volatile flag, considering
> it as a potential follow-up improvement? I'd prefer to get this
> AArch64-only change in as is -- or, at least, with some tweaking --
> before going on to consider something which requires changes to shared code.
> Also, I mentioned the CAS case above because that is what I am looking
> at next and it extends the technique used in the current patch.
> Currently CAS operations translate to something which looks like this
> cbz retry
> The ldaxr/stlxr pair are generated in response to a CompareAndSwapX node
> and the dmb instructions arise from MemBar nodes. For a CAS AArch64 can
> safely elide the dmb instructions. So, the next improvement will be to
> extend the unnecessary_membar_xxx predicates to elide the dmb generation
> when a suitable configuration of MemBar_XXX and CompareAndSwapX nodes is
> seen. Of course, this test will not be helped at all by provision of a
> volatile flag.
> Andrew Dinn
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in UK and Wales under Company Registration No. 3798903
> Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
> (USA), Michael O'Neill (Ireland)
More information about the hotspot-dev