RFR: 8006971: More missing barriers in taskqueues for RMO architectures

David Holmes david.holmes at oracle.com
Mon Aug 5 23:40:21 PDT 2013

On 6/08/2013 4:02 PM, Vladimir Kozlov wrote:
> I rubber-stamp it :)
> Reviewed.
> But you should use original changeset header (bug id 8012144, Summary: ,
> Contributed-by: ) from hsx24 to track the same changes in both releases.
> You could use hg export/import since changes are the same.
> We can keep 8006971 open for more general solution later with added
> comment that we used 8012144 changes in 7u40 and jdk8.

Thanks Vladimir - yes that is an excellent point. I would have 
overlooked that and used 8006971 instead of 8012144. Though that does 
leave me a bit puzzled about what 8006971 actually refers to now.


> Thanks,
> Vladimir
> On 8/5/13 9:04 PM, David Holmes wrote:
>> Hi David,
>> On 6/08/2013 12:14 PM, David Chase wrote:
>>> I'm not a reviewer, but if that's ABP queues (a comment pointing to
>>> the relevant literature would sure be nice, no
>>> matter what it is; this stuff is toxically hard) I am in theory
>>> qualified to eyeball it, and I can perhaps nab some
>>> people from SSRG upstairs.  (It's late and I am still jetlagged, so I
>>> am not looking hard tonight, but it sure looks
>>> like ABP).
>>> If it's not based on ABP queues, say so, and I will look more
>>> cautiously and with less authority.
>> I believe, based on other correspondence, it is based on Chase-Lev
>> work-stealing queue - but I'm not familiar with the
>> ABP queues. I don't know who actually wrote this code but I suspect
>> they are no longer working on the JDK.
>> There is quite a bit of history with this one. The TaskQueue has been
>> around for a while and works beautifully (as far
>> as we know :) ) on TSO systems. The SAP folk doing the PPC64 port
>> encountered some issues and proposed some fixes. Long
>> story short we settled on this particular patch (a full fence() on the
>> non-TSO systems) for 7u40 as it appeared to fix
>> the known problems (even if potentially a heavier barrier than needed
>> - hence the conditional compilation to not affect
>> TSO systems).
>> I considered this the "wrong fix" as to me the need for the fence()
>> indicated a missing barrier elsewhere - plus the
>> algorithm should be written without ifdefs - and I hoped we would
>> resolve that before fixing it in JDK 8. But we haven't
>> been able to resolve that and we need this fix in place ASAP - hence
>> the fallback to using the same form as was accepted
>> for 7u40.
>> So basically this is a forward port of an existing fix and we just
>> need to rubber-stamp it. But one of our original
>> stampers is no longer with us so ...
>> Thanks,
>> David H.
>>> David
>>> On 2013-08-05, at 9:01 PM, David Holmes <david.holmes at oracle.com> wrote:
>>>> Hi Vlad
>>>> On 6/08/2013 12:43 AM, Vladimir Danushevsky wrote:
>>>>> On Aug 3, 2013, at 6:53 AM, David Holmes wrote:
>>>>>> On 2/08/2013 11:57 PM, Vladimir Danushevsky wrote:
>>>>>>> The issue of missing memory barriers in the GC taskqueue code was
>>>>>>> first flagged here:
>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-March/008848.html
>>>>>>> JDK7u40 fix for the same issue is located here:
>>>>>>> http://cr.openjdk.java.net/~dholmes/8012144/webrev.hs24/
>>>>>>> Initially we planned to port same solution to JDK8 however after
>>>>>>> reviewing the algorithm more we've started
>>>>>>> questioning a need for a full fence in between 'age' and 'bottom'
>>>>>>> elements. Since the intent is to keep 'bottom'
>>>>>>> memory reference from being executed before 'age' would a
>>>>>>> LoadLoad barrier (which in many cases is a cheaper
>>>>>>> solution) be sufficient? If so, the webrev below could possible
>>>>>>> be an adequate solution.
>>>>>>> http://cr.openjdk.java.net/~vladidan/8006971/webrev.00/
>>>>>>> We have tested both cases (fence and LL) on a hexa-core Power5
>>>>>>> box running several test suites that currently
>>>>>>> expose the problem. The results are positive.
>>>>>> The loadload() should not be in any ifdef. The loadload() is part
>>>>>> of the algorithmic correctness. The loadload()
>>>>>> will become a no-op on any platform that does not need to do
>>>>>> anything special to preserve the ordering.
>>>>> As I understand LL is not an issue on platforms that are excluded
>>>>> from emitting the barrier in the provided patch.
>>>>> However I went to read further discussion at
>>>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-March/008848.html
>>>>> and seems the concern is the Store is not guaranteed to propagate
>>>>> to all observers if read before the Writer's side
>>>>> 'sync'. I speculate that might not be an issue on PowerPC
>>>>> implementations with L1 cache snooping though, but even if
>>>>> this a case there is no robust way to detect that in runtime.
>>>> That sounds like the fence() is missing on the writer side.
>>>>> But that is likely not an issue on ARM (not sure about IA64, as it
>>>>> was listed in the very first webrev from Goetz)
>>>>> therefore we might inject OrderAccess::fence() for PPC (both 32-
>>>>> and 64-bit) and OrderAccess::loadload() for ARM
>>>>> (again, need info on IA64).
>>>>> That being said , for simplicity we can go with fence() for ARM
>>>>> case too since current ARMv7 implementations do not
>>>>> imply a separate barrier instruction for Loads.
>>>>> So in other words use same patch as in JDK7u40:
>>>>> http://cr.openjdk.java.net/~vladidan/8006971/webrev.01
>>>> I think this is masking the real problem with the algorithm but
>>>> unless we get some true expertise involved here I
>>>> don't see how we will resolve this. And time is now critical.
>>>> So I will accept the 7u version of the fix under the same rationale
>>>> as for 7u. We do need this to be fixed. Reviewed.
>>>> I hope this is not the end of it though as this data structure needs
>>>> to be revisited in my opinion.
>>>> We need a second reviewer.
>>>> Thanks,
>>>> David
>>>>> Thanks,
>>>>> Vlad
>>>>>> Thanks,
>>>>>> David
>>>>>>> As a side note -
>>>>>>> perhaps it is possible to eliminate age/bottom potential
>>>>>>> reordering by loading both simultaneously through an
>>>>>>> Atomic class method. That would require though some structural
>>>>>>> changes to the layout of TaskQueueSuper class to
>>>>>>> align both fields together and ensure proper integer alignment
>>>>>>> (depending on 32/64-bit port), therefore this
>>>>>>> solution is less practical for the short term.
>>>>>>> Thanks,
>>>>>>> Vlad

More information about the hotspot-dev mailing list