RFR(M): 8033552: Fix missing missing volatile specifiers in CAS operations in GC code

Erik Osterlund erik.osterlund at oracle.com
Fri Sep 16 07:21:56 UTC 2016

Hi Kim,

On 14 sep. 2016, at 22:49, Kim Barrett <kim.barrett at oracle.com> wrote:

>> On Sep 13, 2016, at 11:28 AM, Erik Österlund <erik.osterlund at oracle.com> wrote:
>> Hi Kim,
>> New webrev with your proposed changes:
>> http://cr.openjdk.java.net/~eosterlund/8033552/webrev.05/
> Looks good.

Thanks for the review! :) I would need a sponsor to push this.

> A few replies inline below.

And some for me too.

>> On 2016-09-13 00:20, Kim Barrett wrote:
>>> src/share/vm/gc/shared/taskqueue.hpp
>>> 129     void increment() volatile {
>>> 130       _fields._top = increment_index(_fields._top);
>>> 131       if (_fields._top == 0) ++_fields._tag;
>>> 132     }
>>> Perhaps this should be
>>>  uint new_top = increment_index(_fields._top);
>>>  _fields._top = new_top;
>>>  if (new_top == 0) ++_fields._tag;
>>> Although if it matters then we probably have other
>>> problems... e.g. adding a volatile qualifier doesn't make this
>>> thread-safe. And if it doesn't need to be thread-safe, then why add
>>> the volatile qualifier?
>> I do not have strong opinions on this one. It should not matter as you say, but it seemed more consistent when the other member functions above were implemented with volatile.
>> I decided to withdraw that volatile specifier on this one as it seems you would prefer that.
> The addition of the volatile qualifier suggests a need for
> thread-safety, but the existing code isn't (nor is my suggested
> change, though it's a step along the way).
> So I looked at how increment() is used, and the volatile isn't needed.
> It is only applied to a freshly made copy.
> So I accept the withdrawal of that change.

Yeah I just tried to be consistent with the rest but it's really not necessay, and I'm not quite sure why that class is so MT-defensive in general.

>>> -----------------------------------------------------------------------------
>>> [This question from my initial review seems not covered anywhere.  I
>>> don't know if there is actually a bug here.  If there is, it should be
>>> a new CR.]
>>> src/share/vm/gc/g1/sparsePRT.hpp
>>> -- no direct comments, however:
>>> src/share/vm/gc/g1/sparsePRT.cpp
>>> Can add_to_expanded_list be called by multiple threads with the same
>>> argument at the same time?  If so, it doesn't look correct.  It has a
>>> comment saying multiple expansions are possible, but should only be
>>> put on list once.  But it's not clear whether that can happen from
>>> multiple threads at the same time.  The check expanded / set expanded
>>> dance is not thread safe.
>> I did answer it, but sorry if I was not explicit enough that my comment was linked to this. Yes it can be called by multiple threads. However, I can not see what is wrong about that.
>> If you are thinking about the classic ABA problem, I can not see how that would be triggered, because the list is operated on in split phases protected by a safepoint where it's either add-only (outside of safepoint, while adding sparsePRT data), or remove-only (inside of safepoint, before card scanning).
>> Having multiple competing threads only adding or only removing from the list without explicit ABA-protection mechanisms should be safe.
>> The only remaining problem is the variable _expanded. And it is changed only on the current sparsePRT, while under the per-region lock in OtherRegionsTable::add_reference() calling SparsePRT::add_card, calling SparsePRT::add_to_expanded_list, ultimately calling add_to_expanded_list that we are currently looking at.
>> So the data for the specific SparsePRT is changed under a per-region lock, and it's added to a list (which can happen concurrent to other SparsePRTs being expanded), but that is fine as the ABA-problem can't occur as the entries do not get deleted until after a safepoint synchronization, when the list is consumed during cleanup before card scanning.
> The point is that there's a comment that it should only be added to
> the list once, but the mechanism for doing so is not thread-safe, so
> unless there is a lock somewhere protecting the call (which Thomas
> provided a pointer to - thanks Thomas), then the constraint described
> by the comment may not be met.

Yes I see your point - it is not obvious that 1) consumers and producers for the list are split in phases with a safepoint (otherwise it wouldn't work), and 2) the local sparsePRT fields must be changed when under a per-region lock way earlier in the call tree.

Thanks a lot for reviewing,

More information about the hotspot-gc-dev mailing list