RFR(M): 8033552: Fix missing missing volatile specifiers in CAS operations in GC code

Erik Österlund erik.osterlund at oracle.com
Mon Sep 12 15:14:27 UTC 2016

Hi Kim,

Thank you for your long and thorough review of my slightly too long 
patch. I have rebased the patch on the latest hotspot sources and split 
the patch into smaller chunks and CRs.
I will start sending them out soon so we can get this over and done with.

In general, the card table changes have been backed out so we can defer 
that to another day. The changes where you had no comments will remain 
in the CR. The rest have been split out to new CRs.

Here is a list of the new related CRs:

1) g1ConcurrentMark is missing volatile specifiers for _finger.
     Bug: https://bugs.openjdk.java.net/browse/JDK-8165856
2) CMS _overflow_list is missing volatile specifiers.
     Bug: https://bugs.openjdk.java.net/browse/JDK-8165857
3) heapRegionManager is missing volatile specifier for _claims.
     Bug: https://bugs.openjdk.java.net/browse/JDK-8165858
4) gcTaskThread is missing volatile specifier and barriers for _time_stamps.
     Bug: https://bugs.openjdk.java.net/browse/JDK-8165859
5) workgroup classes are missing volatile specifiers for lock-free code.
     Bug: https://bugs.openjdk.java.net/browse/JDK-8165860

Here is a new webrev for the rebased patch with the above changes and 
card table changes moved out:

On 2016-09-09 01:22, Kim Barrett wrote:
>> On Apr 6, 2016, at 2:54 PM, Erik Österlund <erik.osterlund at oracle.com> wrote:
>> Hi,
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8033552
>> CR: http://cr.openjdk.java.net/~eosterlund/8033552/webrev.00/
>> Basically, there have been issues with Atomic::* operations being used on fields declared non-volatile. Especially in loop load phi cas phi-style code where the reload of a non-volatile field is not done because the compiler is allowed to assume it has not changed and keep the value in register.
>> My proposed patch fixes this by declaring all fields used in GC code volatile. As David Holmes said - all such code should use volatile unless there are peculiar reasons for specific code not to for performance reasons.
>> I also changed the cas phi load phi loops to not reload the field, for the simple reason that it is unnecessary - the CAS already performs the required load, so there is no need to do it again. So whether safe or not do do this with volatiles, a pointless load is a pointless load, so I removed them. (cf. g1ConcurrentMark.cpp and workgroup.cpp)
>> Testing:
>> * JPRT
>> * Benchmarked with SPECjbb2005 and SPECjvm2008 to make sure there was no regression
>> I will need a sponsor to push this.
> Finally made my way through (most of) this.  Sorry it's taken so long,
> but it's a pretty challenging changeset to review.
> ------------------------------------------------------------------------------
> Many of the changes here are to make the contents of card tables
> volatile.  I think those changes should be separated.  I'm not certain
> they should even be done, but not in their currently offered form, and
> not mixed with a bunch of unrelated changes.

Okay, let's defer that to a better time. We have had a few bugs that 
would probably have been avoided if card tables were volatile.
But perhaps this isn't the right time. I have excluded card table 
changes in my updated patches. Hope this will be revisited in the future.

> One issue with the changes for card tables is that they introduce a
> lot of casts to strip off the volatile qualifier.  Some of those could
> be eliminated by fixing / extending the presently volatile-free APIs
> being called that require those casts.  (For example, something like
> what was recently done with pointer_delta, adding volatile qualifiers
> to its parameters.)
> On the assumption these changes will be backed out, I've not reviewed
> the following files:
> src/cpu/aarch64/vm/aarch64.ad
> src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
> src/cpu/ppc/vm/c1_Runtime1_ppc.cpp
> src/cpu/ppc/vm/macroAssembler_ppc.cpp
> src/cpu/ppc/vm/macroAssembler_ppc.hpp
> src/cpu/sparc/vm/c1_Runtime1_sparc.cpp
> src/cpu/sparc/vm/macroAssembler_sparc.cpp
> src/cpu/sparc/vm/macroAssembler_sparc.hpp
> src/cpu/sparc/vm/stubGenerator_sparc.cpp
> src/share/vm/c1/c1_LIRGenerator.cpp
> src/share/vm/gc/cms/parCardTableModRefBS.cpp
> src/share/vm/gc/g1/dirtyCardQueue.cpp
> src/share/vm/gc/g1/dirtyCardQueue.hpp
> src/share/vm/gc/g1/g1CardCounts.cpp
> src/share/vm/gc/g1/g1CardCounts.hpp
> src/share/vm/gc/g1/g1CollectedHeap.cpp
> src/share/vm/gc/g1/g1EvacFailure.cpp
> src/share/vm/gc/g1/g1HotCardCache.cpp
> src/share/vm/gc/g1/g1HotCardCache.hpp
> src/share/vm/gc/g1/g1ParScanThreadState.hpp
> src/share/vm/gc/g1/g1RemSet.cpp
> src/share/vm/gc/g1/g1RemSet.hpp
> src/share/vm/gc/g1/g1SATBCardTableModRefBS.cpp
> src/share/vm/gc/g1/heapRegion.cpp
> src/share/vm/gc/g1/heapRegion.hpp
> src/share/vm/gc/parallel/cardTableExtension.cpp
> src/share/vm/gc/parallel/cardTableExtension.hpp
> src/share/vm/gc/shared/cardTableModRefBS.cpp
> src/share/vm/gc/shared/cardTableModRefBS.hpp
> src/share/vm/gc/shared/cardTableModRefBS.inline.hpp
> src/share/vm/gc/shared/cardTableModRefBSForCTRS.hpp
> src/share/vm/gc/shared/cardTableRS.cpp
>    -- card table changes only

Okay, that's fine. Backed those files out from this CR.

> And I've only reviewed the non-card-table changes in these files:
> src/share/vm/jvmci/jvmciCompilerToVM.cpp
> src/share/vm/jvmci/jvmciCompilerToVM.hpp
> src/share/vm/runtime/vmStructs.cpp
>    -- mix of card table and other
>    -- no comments on the non-card-table changes

Card-table changes have been backed out.

> ------------------------------------------------------------------------------
> src/share/vm/gc/shared/cardTableRS.hpp
>    92   volatile jbyte* _last_cur_val_in_gen;
> I don't think the volatile qualifier needs to be added here.
> The values are only assigned in single-threaded contexts
> (initialization, and in younger_refs_iterate from gen_process_roots).
> The values are only read by worker threads started after the
> assignment (or read by the assigning thread if parallel workers aren't
> involved).
> Other changes to this file are specific to card tables, which I've not
> reviewed.

This was part of my non-volatile card rampage. I decided that the type 
of cards should consistently be volatile jbyte* and that's what we 
should be passing around without compromising unless there are good 
reasons not to, instead of the other way around. But with card table 
changes out of this review, I removed these changes too.

> ------------------------------------------------------------------------------
> src/share/vm/gc/g1/g1ConcurrentMark.cpp
> 1924       assert(_finger > finger, "the finger should have moved forward");
> 1925       // read it again
> 1926       finger = res;
> [Change was to replace "finger = _finger;" with "finger = res;"]
> This change doesn't seem necessary.  If the change is made, the
> preceeding assert and comment also need updating.

I have split the changes of g1ConcurrentMark out to a separate CR, and 
solved it by changing the assert and removed the comment.

> ------------------------------------------------------------------------------
> src/share/vm/gc/cms/cmsOopClosures.hpp
> src/share/vm/gc/cms/concurrentMarkSweepGeneration.cpp
>    -- no comments


> ------------------------------------------------------------------------------
> src/share/vm/gc/cms/parNewGeneration.hpp
>   326   volatile oop _overflow_list;
> Adding volatile to _overflow_list induces the changes to
> src/share/vm/oops/oopsHierarchy.hpp
> adding many volatile operator overloads for the oop class.
> I'm not keen on the oop changes, and that seems like a lot of fannout
> for the _overflow_list change.  Also, _overflow_list doesn't really
> involve oops, more like the shattered remains of oops.
> I think a better change would be to not add the overloads to the oop
> class, and instead change the type of _overflow_list, perhaps to
> something like "HeapWord* volatile", and carry through the adjustments
> needed for that.
> I think such a change ought to be done under a different CR, with the
> relevant changes from this changeset backed out.  I *think* the
> existing manipulation of _overflow_list has sufficient barriers to
> work even in the absence of the volatile qualifier, and that's what
> we've been using all along.
> ------------------------------------------------------------------------------
> src/share/vm/gc/cms/concurrentMarkSweepGeneration.hpp
> I think CMSCollector::_overflow_list may also need a volatile qualifier.
> But see discussion above for parNewGeneration.hpp and
> oopsHierarchy.hpp about ParNewGeneration::_overflow_list.

I have moved the lack of volatiles for the overflow lists to a separate 
CR, and addressed it in the way you propose.

> ------------------------------------------------------------------------------
> src/share/vm/gc/g1/heapRegionManager.cpp
>   485   _claims = NEW_C_HEAP_ARRAY(uint, _n_regions, mtGC);
>   486   memset((uint*)_claims, Unclaimed, sizeof(*_claims) * _n_regions);
> I would prefer a const_cast to strip away volatile.  Could avoid cast by
>    uint* new_claims = NEW_C_HEAP_ARRAY(...);
>    memset(new_claims, ...);
>    _claims = new_claims;

Okay. I have created a new CR and new patch with these specific changes.

> ------------------------------------------------------------------------------
> src/share/vm/gc/g1/heapRegionManager.hpp
> src/share/vm/gc/g1/heapRegionRemSet.cpp
>    -- no comments
> ------------------------------------------------------------------------------
> src/share/vm/gc/g1/sparsePRT.hpp
>    -- no direct comments, however:
> src/share/vm/gc/g1/sparsePRT.cpp
> Can add_to_expanded_list be called by multiple threads with the same
> argument at the same time?  If so, it doesn't look correct.  It has a
> comment saying multiple expansions are possible, but should only be
> put on list once.  But it's not clear whether that can happen from
> multiple threads at the same time.  The check expanded / set expanded
> dance is not thread safe.

There is one sparsePRT per region, and they are split between concurrent 
producer and concurrent consumers, with a safepoint. That is, when the 
mutators are running, concurrent producers compete for producing entries 
for the sparsePRT, and before scanning in a safepoint, concurrent 
consumers compete for consuming the entries.

Therefore, I can not spot any potential concurrency issues. And 
therefore they remain as part of this CR.

> ------------------------------------------------------------------------------
> src/share/vm/gc/parallel/gcTaskThread.hpp
>    -- no direct comments, however:
> src/share/vm/gc/parallel/gcTaskThread.hpp
> time_stamp_at() is using a variant of DCLP, but is missing the acquire
> on the initial read of _time_stamps.

Yes this class seems broken when looking closer. There is another user 
of _time_stamps below that also misses fencing. I have created a new CR 
with proper fencing for the time stamps that we found here.

> ------------------------------------------------------------------------------
> src/share/vm/gc/parallel/mutableSpace.hpp
> src/share/vm/gc/parallel/parallelScavengeHeap.hpp
>    -- no comments
> ------------------------------------------------------------------------------
> src/share/vm/gc/parallel/psCompactionManager.cpp
> src/share/vm/gc/parallel/psCompactionManager.hpp
> _recycled_bottom and friends were eliminated by
> 8150994: UseParallelGC fails with UseDynamicNumberOfGCThreads with specjbb2005

Ok. I have backed out these changes in my rebased patch.

> ------------------------------------------------------------------------------
> src/share/vm/gc/parallel/psYoungGen.hpp
> src/share/vm/gc/serial/defNewGeneration.cpp
> src/share/vm/gc/serial/defNewGeneration.hpp
> src/share/vm/gc/shared/collectedHeap.hpp
> src/share/vm/gc/shared/genCollectedHeap.cpp
> src/share/vm/gc/shared/genCollectedHeap.hpp
> src/share/vm/gc/shared/generation.hpp
> src/share/vm/gc/shared/taskqueue.hpp
>    -- no comments


> ------------------------------------------------------------------------------
> src/share/vm/gc/shared/workgroup.hpp
> src/share/vm/gc/shared/workgroup.cpp
>   287 class SubTasksDone: public CHeapObj<mtInternal> {
>   288   volatile uint* _tasks;
> ...
>   290   volatile uint _threads_completed;
> Added volatile to member declarations, but corresponding changes in
> the cpp file seem to be here:
>   452 bool SequentialSubTasksDone::is_task_claimed(uint& t) {
>   464 bool SequentialSubTasksDone::all_tasks_completed() {
> So volatile was added to similarly (but not identially) named members
> of a different class from where the change is needed.  But perhaps
> parallel changes are needed for both classes?

Agreed; more volatiles is better. I have split out the workgroup files 
into a separate CR that adds more volatiles as you suggested, to both of 
the two classes.

> ------------------------------------------------------------------------------
> src/share/vm/gc/parallel/vm_Structs_parallelgc.hpp
>    -- no comments
> ------------------------------------------------------------------------------


I have tried running the new patch through JPRT, and it was fine.

Thanks a lot for going through these changes for me. Appreciate it! Will 
try and write smaller patches in the future.
Have prepared the new related patches and will roll them out as they 
make it through tests.


More information about the hotspot-gc-dev mailing list