CMS parallel initial mark

Jon Masamitsu jon.masamitsu at
Fri Jun 7 14:56:32 UTC 2013

On 6/6/2013 12:25 PM, Thomas Schatzl wrote:
> Hi all,
> On Thu, 2013-06-06 at 10:06 -0700, Jon Masamitsu wrote:
>> On 6/6/13 9:02 AM, Thomas Schatzl wrote:
>>> Hi,
>>> On Tue, 2013-05-28 at 17:24 -0700, Hiroshi Yamauchi wrote:
>>>> Hi,
>>>> I'd like to have the following contributed if it makes sense.
>>>> 1) Here's a patch (against a recent revision of the hsx/hotspot-gc repo):
>>>> that implements a parallel version of the initial mark phase of the
>>>> CMS collector. It's relatively a straightforward parallelization of
>>>> the existing single-threaded code. With the above patch, I see about
>>>> ~3-6x speedup in the initial mark pause times.
>>>     had a look at this patch in the version at
>>> Looks good so far, some comments:
>>>     - in cmsOopClosures.hpp, MarkRefsIntoClosure and the new
>>> Par_MarkRefsIntoClosure could be refactored slightly as they have
>>> exactly the same member variables. Not sure how this situation is
>>> handled in other code though, and what others (Jon) think.
>> Thomas,
>> If you don't mind I'd like to keep this changeset to a minimum so
>> not do any additional refactoring.  That's a good suggestion but
>> since this is the first sizable contribution I'm sponsoring, simpler
>> is better for me.
> Okay. It would be a tiny additional change though, which has only been
> enabled by the addition of the Par_MarkRefsIntoClosure, and of course
> depends on whether the old serial initial marking code is kept.


> (Maybe it is also possible to remove the underscore in the class name?
> It's not exactly customary to have class names with underscores in the
> Hotspot code - I think)
>>>     - in concurrentMarkSweepGeneration.cpp in
>>> CMSCollector::checkpointRootsInitialWork() the new code seems a little
>>> strange to me. I.e. roughly it does the following:
>>>     [...]
>>>     if (CMSParallelInitialMarkEnabled) {
>>>       [ set up parallel stuff ]
>>>       if (n_workers > 1) {
>>>         // do parallel thread
>>>       } else {
>>>         // do serial work using the current thread
>>>       }
>>>     } else {
>>>       // do serial work
>>>     }
>>> it does not feel good to have two serial paths here; maybe use the
>>> serial/original code if n_workers == 1 and guarantee(n_workers > 1, ) in
>>> the first part of the if? Or look if the old serial path could be
>>> removed? (or the new one).
>> Is the inner serial path one thread executing a single chunk that
>> does all the work (i.e., executes the parallel code but by only
>> one thread)?
> Yes, that is in my opinion a little awkward. It would be nice if we had
> only one serial initial marking code path to simplify
> debugging/maintenance if there are issues. Removal of one or the other
> (if so) probably depends on how different the performance between the
> two paths is.

I would prefer not to remove the path that executes the parallel code
with a single thread.  It is a tool for looking for performance anomalies
in the parallel code.  I've used code like it (i.e., executes parallel 
code in
one thread in UseParallelGC and UseParNewGC) this week while looking
for a performance problem.

The strictly serial path (which is the code we have today) would be nice
to have for a while as a workaround for any problems.

>>>    - the comments for at least CMSCollector::
>>> initialize_sequential_subtasks_for_young_gen_rescan() and
>>> CMSCollector::merge_survivor_plab_arrays() should be reworked - they
>>> only mention parallel rescan as entry points.
> I recommend fixing the comments though - wrong comments are not exactly
> helpful, although it's just an omission here.
>>>    - the change in the assert in
>>> CMSCollector::merge_survivor_plab_arrays() looks a little odd now. I
>>> mean, looking at that assert in isolation poses the question why is this
>>> method only called when doing initial mark in parallel? It somehow seems
>>> that the same work is done somewhere else just for the initial mark and
>>> serial case, which just looks strange to me. That's just an observation
>>> from me.
>>> assert(_collectorState == FinalMarking ||
>>>            (CMSParallelInitialMarkEnabled && _collectorState ==
>>> InitialMarking), "Error");
>>>    - in CMSCollector::CMSCollector() there is this unconditional setup of
>>> a few data structures used for parallel processing if any of the
>>> CMSParallel* switches is enabled.
>> I'm not sure what code this is.  What lines in the webrev?
> I hit the send button too eagerly (was at leaving the office), sorry. I
> meant ConcurrentMarkSweepGeneration.cpp:727, and was thinking about
> removing on of the serial initial marking paths.
> If the old serial version (old code) is kept, this condition needs to be
> adapted, as data structures within the if-block started at that line are
> only used in the parallel case.

Where talking about this like

   if ((CMSParallelRemarkEnabled && CMSParallelSurvivorRemarkEnabled) || 
CMSParallelInitialMarkEnabled) {

and you're saying the
> this condition needs to be
> adapted, as data structures within the if-block started at that line are
> only used in the parallel case.
Sorry, I still don't get it.  Are you saying the condition in the test is
not correct?  By "parallel case" do you mean the case where
more than one GC thread does the work?  Or are you counting
one GC thread executing the parallel code as part of the
"parallel case"?


> But that was just a thought as I am not sure how far you are with
> testing it.
> Thanks,
> Thomas

More information about the hotspot-gc-dev mailing list