RFR: 8006971: More missing barriers in taskqueues for RMO architectures

Vladimir Kozlov vladimir.kozlov at oracle.com
Mon Aug 5 23:02:38 PDT 2013

I rubber-stamp it :)

But you should use original changeset header (bug id 8012144, Summary: , Contributed-by: ) from hsx24 to track the same 
changes in both releases. You could use hg export/import since changes are the same.

We can keep 8006971 open for more general solution later with added comment that we used 8012144 changes in 7u40 and jdk8.


On 8/5/13 9:04 PM, David Holmes wrote:
> Hi David,
> On 6/08/2013 12:14 PM, David Chase wrote:
>> I'm not a reviewer, but if that's ABP queues (a comment pointing to the relevant literature would sure be nice, no
>> matter what it is; this stuff is toxically hard) I am in theory qualified to eyeball it, and I can perhaps nab some
>> people from SSRG upstairs.  (It's late and I am still jetlagged, so I am not looking hard tonight, but it sure looks
>> like ABP).
>> If it's not based on ABP queues, say so, and I will look more cautiously and with less authority.
> I believe, based on other correspondence, it is based on Chase-Lev work-stealing queue - but I'm not familiar with the
> ABP queues. I don't know who actually wrote this code but I suspect they are no longer working on the JDK.
> There is quite a bit of history with this one. The TaskQueue has been around for a while and works beautifully (as far
> as we know :) ) on TSO systems. The SAP folk doing the PPC64 port encountered some issues and proposed some fixes. Long
> story short we settled on this particular patch (a full fence() on the non-TSO systems) for 7u40 as it appeared to fix
> the known problems (even if potentially a heavier barrier than needed - hence the conditional compilation to not affect
> TSO systems).
> I considered this the "wrong fix" as to me the need for the fence() indicated a missing barrier elsewhere - plus the
> algorithm should be written without ifdefs - and I hoped we would resolve that before fixing it in JDK 8. But we haven't
> been able to resolve that and we need this fix in place ASAP - hence the fallback to using the same form as was accepted
> for 7u40.
> So basically this is a forward port of an existing fix and we just need to rubber-stamp it. But one of our original
> stampers is no longer with us so ...
> Thanks,
> David H.
>> David
>> On 2013-08-05, at 9:01 PM, David Holmes <david.holmes at oracle.com> wrote:
>>> Hi Vlad
>>> On 6/08/2013 12:43 AM, Vladimir Danushevsky wrote:
>>>> On Aug 3, 2013, at 6:53 AM, David Holmes wrote:
>>>>> On 2/08/2013 11:57 PM, Vladimir Danushevsky wrote:
>>>>>> The issue of missing memory barriers in the GC taskqueue code was first flagged here:
>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-March/008848.html
>>>>>> JDK7u40 fix for the same issue is located here:
>>>>>> http://cr.openjdk.java.net/~dholmes/8012144/webrev.hs24/
>>>>>> Initially we planned to port same solution to JDK8 however after reviewing the algorithm more we've started
>>>>>> questioning a need for a full fence in between 'age' and 'bottom' elements. Since the intent is to keep 'bottom'
>>>>>> memory reference from being executed before 'age' would a LoadLoad barrier (which in many cases is a cheaper
>>>>>> solution) be sufficient? If so, the webrev below could possible be an adequate solution.
>>>>>> http://cr.openjdk.java.net/~vladidan/8006971/webrev.00/
>>>>>> We have tested both cases (fence and LL) on a hexa-core Power5 box running several test suites that currently
>>>>>> expose the problem. The results are positive.
>>>>> The loadload() should not be in any ifdef. The loadload() is part of the algorithmic correctness. The loadload()
>>>>> will become a no-op on any platform that does not need to do anything special to preserve the ordering.
>>>> As I understand LL is not an issue on platforms that are excluded from emitting the barrier in the provided patch.
>>>> However I went to read further discussion at
>>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-March/008848.html
>>>> and seems the concern is the Store is not guaranteed to propagate to all observers if read before the Writer's side
>>>> 'sync'. I speculate that might not be an issue on PowerPC implementations with L1 cache snooping though, but even if
>>>> this a case there is no robust way to detect that in runtime.
>>> That sounds like the fence() is missing on the writer side.
>>>> But that is likely not an issue on ARM (not sure about IA64, as it was listed in the very first webrev from Goetz)
>>>> therefore we might inject OrderAccess::fence() for PPC (both 32- and 64-bit) and OrderAccess::loadload() for ARM
>>>> (again, need info on IA64).
>>>> That being said , for simplicity we can go with fence() for ARM case too since current ARMv7 implementations do not
>>>> imply a separate barrier instruction for Loads.
>>>> So in other words use same patch as in JDK7u40:
>>>> http://cr.openjdk.java.net/~vladidan/8006971/webrev.01
>>> I think this is masking the real problem with the algorithm but unless we get some true expertise involved here I
>>> don't see how we will resolve this. And time is now critical.
>>> So I will accept the 7u version of the fix under the same rationale as for 7u. We do need this to be fixed. Reviewed.
>>> I hope this is not the end of it though as this data structure needs to be revisited in my opinion.
>>> We need a second reviewer.
>>> Thanks,
>>> David
>>>> Thanks,
>>>> Vlad
>>>>> Thanks,
>>>>> David
>>>>>> As a side note -
>>>>>> perhaps it is possible to eliminate age/bottom potential reordering by loading both simultaneously through an
>>>>>> Atomic class method. That would require though some structural changes to the layout of TaskQueueSuper class to
>>>>>> align both fields together and ensure proper integer alignment (depending on 32/64-bit port), therefore this
>>>>>> solution is less practical for the short term.
>>>>>> Thanks,
>>>>>> Vlad

More information about the hotspot-dev mailing list