RFR(S): 8005032: G1: Cleanup serial reference processing closures in concurrent marking

Bengt Rutisson bengt.rutisson at oracle.com
Thu Jan 31 08:00:11 UTC 2013


On 1/30/13 6:51 PM, Jon Masamitsu wrote:
> On 1/30/2013 5:21 AM, Bengt Rutisson wrote:
>> Hi Charlie,
>> On 1/17/13 9:52 PM, Charlie Hunt wrote:
>>> John / Bengt:
>>> I think I can offer a bit of info on Bengt's earlier question about 
>>> ParallelProcRefEnabled being disabled by default.
>>> IIRC, there was one workload that showed a slight perf regression 
>>> with +ParallelProcRefEnabled.  That workload that showed a 
>>> regression may not be as relevant as it was back when the evaluation 
>>> / decision was made to disable it by default?
>> Thanks for providing some history for these flags!
>>> You both have probably thought about this already?  My reaction is 
>>> ... I think reasonable defaults would be to enable 
>>> +ParallelProcRefEnabled for Parallel[Old], CMS and G1 when 
>>> ParallelGCThreads is greater than 1, and disable 
>>> -ParallelProcRefEnabled with -XX:+UseSerialGC.
>> This sounds like a good enhancement. John, if you agree, could you 
>> file a CR for it? Or would you like me to file it?
> I think  the regression was with benchmarks that did not have much 
> Reference processing
> but still had to pay the build up/tear down cost for the parallel 
> work.  Andrew did the work and
> I think there was lots of performance data that went into turning it 
> off by default.

Thanks, that is good to know. Do you remember if the benchmark had 
regressions for all GCs or just some of them? I think Charlies 
suggestion was to make parallel reference processing true by default for 
the GCs where we think it would be beneficial. Not necessarily for all 
of them.

Do you know if it is possible to find the original performance data some 


> Jon
>> Bengt
>>> hths,
>>> charlie ...
>>> On Jan 17, 2013, at 3:02 PM, John Cuthbertson wrote:
>>>> Hi Bengt,
>>>> There's a new webrev at: 
>>>> http://cr.openjdk.java.net/~johnc/8005032/webrev.1/
>>>> It looks larger than the previous webrev but the most of the change 
>>>> was
>>>> tweaking comments. The actual code changes are smaller.
>>>> Testing was the same as before.
>>>> On 1/15/2013 1:18 AM, Bengt Rutisson wrote:
>>>>> I see. I didn't think about the difference betweeen ParallelGCThreads
>>>>> and ParallelRefProcEnabled. BTW, not part of this change, but why do
>>>>> we have ParallelRefProcEnabled? And why is it false by default?
>>>>> Wouldn't it make more sense to have it just be dependent on
>>>>> ParallelGCThreads?
>>>> I don't know and the answer is probably lost in the dark depths of 
>>>> time
>>>> - I can only speculate. For G1 we have a CR to turn
>>>> ParallelRefProcEnabled on if the number of GC threads > 1. I'm not 
>>>> sure
>>>> about the other collectors.
>>>>>> Setting it once in weakRefsWork() will not be sufficient. We will 
>>>>>> run
>>>>>> into an assertion failure in
>>>>>> ParallelTaskTerminator::offer_termination().
>>>>>> During the reference processing, the do_void() method of the
>>>>>> complete_gc oop closure (in our case the complete gc oop closure is
>>>>>> an instance of G1CMParDrainMarkingStackClosure) is called multiple
>>>>>> times (in process_phase1, sometimes process_phase2, process_phase3,
>>>>>> and process_phaseJNI)
>>>>>> Setting the phase sets the number of active tasks (or threads) that
>>>>>> the termination protocol in do_marking_step() will wait for. When an
>>>>>> invocation of do_marking_step() offers termination, the number of
>>>>>> tasks/threads in the terminator instance is decremented. So Setting
>>>>>> the phase once will let the first execution of do_marking_step (with
>>>>>> termination) from process_phase1() succeed, but subsequent calls to
>>>>>> do_marking_step() will result in the assertion failure.
>>>>>> We also can't unconditionally set it in the do_void() method or even
>>>>>> the constructor of G1CMParDrainMarkingStackClosure. Separate
>>>>>> instances of this closure are created by each of the worker threads
>>>>>> in the MT-case.
>>>>>> Note when processing is multi-threaded the complete_gc instance used
>>>>>> is the one passed into the ProcessTask's work method (passed into
>>>>>> process_discovered_references() using the task executor instance)
>>>>>> which may not necessarily be the same complete gc instance as the 
>>>>>> one
>>>>>> passed directly into process_discovered_references().
>>>>> Thanks for this detailed explanation. It really helped!
>>>>> I understand the issue now, but I still think it is very confusing
>>>>> that _cm->set_phase() is called from
>>>>> G1CMRefProcTaskExecutor::execute() in the multithreaded case and from
>>>>> G1CMParDrainMarkingStackClosure::do_void() in the single threaded 
>>>>> case.
>>>>>> It might be possible to record whether processing is MT in the
>>>>>> G1CMRefProcTaskExecutor class and always pass the executor instance
>>>>>> into process_discovered_references. We could then set processing to
>>>>>> MT so that the execute() methods in the executor instance are 
>>>>>> invoked
>>>>>> but call the Proxy class' work method directly. Then we could
>>>>>> override the set_single_threaded() routine (called just before
>>>>>> process_phaseJNI) to set the phase.
>>>>> I think this would be a better solution, but if I understand it
>>>>> correctly it would mean that we would have to change all the
>>>>> collectors to always pass a TaskExecutor. All of them currently pass
>>>>> NULL in the non-MT case. I think it would be simpler if they always
>>>>> passed a TaskExecutor but it is a pretty big change.
>>>> I wasn't meaning to do that for the other collectors just G1's
>>>> concurrent mark reference processor i.e. fool the ref processor into
>>>> think it's MT so that the parallel task executor is used but only use
>>>> the work gang if reference processing was _really_ MT.
>>>> I decided not to do this as there is an easier way. For the non-MT 
>>>> case
>>>> we do not need to enter the termination protocol in
>>>> CMTask::do_marking_step(). When there's only one thread we don't 
>>>> need to
>>>> use the ParallelTaskTerminator to wait for other threads. And we
>>>> certainly don't need stealing. Hence the solution is to only do the
>>>> termination and stealing if the closure is instantiated for MT 
>>>> reference
>>>> processing. That removes the set_phase call().
>>>>> Another possibility is to introduce some kind of prepare method to 
>>>>> the
>>>>> VoidClosure (or maybe in a specialized subclass for ref processing).
>>>>> Then we could do something like:
>>>>>   complete_gc->prologue();
>>>>>   if (mt_processing) {
>>>>>     RefProcPhase2Task phase2(*this, refs_lists, 
>>>>> !discovery_is_atomic()
>>>>> /*marks_oops_alive*/);
>>>>>     task_executor->execute(phase2);
>>>>>   } else {
>>>>>     for (uint i = 0; i < _max_num_q; i++) {
>>>>>       process_phase2(refs_lists[i], is_alive, keep_alive, 
>>>>> complete_gc);
>>>>>     }
>>>>>   }
>>>>> G1CMParDrainMarkingStackClosure::prologue() could do the call to
>>>>> _cm->set_phase(). And G1CMRefProcTaskExecutor::execute() would not
>>>>> have to do it.
>>>> The above is a reasonable extension to the reference processing 
>>>> code. I
>>>> no longer need this feature for this change but we should submit a CR
>>>> for it. I'll do that.
>>>>> BTW, not really part of your change, but above code is duplicated
>>>>> three times in ReferenceProcessor::process_discovered_reflist(). 
>>>>> Would
>>>>> be nice to factor this out to a method.
>>>> Completely agree. Again I'll submit a CR for it.
>>>> Thanks,
>>>> JohnC

More information about the hotspot-gc-dev mailing list