RFR: 8006971: More missing barriers in taskqueues for RMO architectures
vladimir.danushevsky at oracle.com
Mon Aug 5 07:43:57 PDT 2013
On Aug 3, 2013, at 6:53 AM, David Holmes wrote:
> On 2/08/2013 11:57 PM, Vladimir Danushevsky wrote:
>> The issue of missing memory barriers in the GC taskqueue code was first flagged here:
>> JDK7u40 fix for the same issue is located here:
>> Initially we planned to port same solution to JDK8 however after reviewing the algorithm more we've started questioning a need for a full fence in between 'age' and 'bottom' elements. Since the intent is to keep 'bottom' memory reference from being executed before 'age' would a LoadLoad barrier (which in many cases is a cheaper solution) be sufficient? If so, the webrev below could possible be an adequate solution.
>> We have tested both cases (fence and LL) on a hexa-core Power5 box running several test suites that currently expose the problem. The results are positive.
> The loadload() should not be in any ifdef. The loadload() is part of the algorithmic correctness. The loadload() will become a no-op on any platform that does not need to do anything special to preserve the ordering.
As I understand LL is not an issue on platforms that are excluded from emitting the barrier in the provided patch.
However I went to read further discussion at
and seems the concern is the Store is not guaranteed to propagate to all observers if read before the Writer's side 'sync'. I speculate that might not be an issue on PowerPC implementations with L1 cache snooping though, but even if this a case there is no robust way to detect that in runtime.
But that is likely not an issue on ARM (not sure about IA64, as it was listed in the very first webrev from Goetz) therefore we might inject OrderAccess::fence() for PPC (both 32- and 64-bit) and OrderAccess::loadload() for ARM (again, need info on IA64).
That being said , for simplicity we can go with fence() for ARM case too since current ARMv7 implementations do not imply a separate barrier instruction for Loads.
So in other words use same patch as in JDK7u40:
>> As a side note -
>> perhaps it is possible to eliminate age/bottom potential reordering by loading both simultaneously through an Atomic class method. That would require though some structural changes to the layout of TaskQueueSuper class to align both fields together and ensure proper integer alignment (depending on 32/64-bit port), therefore this solution is less practical for the short term.
More information about the hotspot-dev