RFR: JDK-8134953: Make the GC ID available in a central place

Bengt Rutisson bengt.rutisson at oracle.com
Wed Sep 30 07:20:33 UTC 2015

On 2015-09-29 18:08, Jon Masamitsu wrote:
> On 9/29/15 8:57 AM, Bengt Rutisson wrote:
>> On 2015-09-29 17:59, Jon Masamitsu wrote:
>>> On 9/29/15 1:09 AM, Bengt Rutisson wrote:
>>>> Hi Jon,
>>>> On 2015-09-28 19:19, Jon Masamitsu wrote:
>>>>> On 09/28/2015 06:21 AM, Bengt Rutisson wrote:
>>>>>> Hi Jon and Per,
>>>>>> I've been trying to get the version proposed in webrev.02 to work 
>>>>>> but I ran into a show stopper for this approach. The G1 
>>>>>> concurrent mark thread is at times executing (and logging) at the 
>>>>>> same time as a young or mixed GC is executing. There is no good 
>>>>>> way of handling this with a common place to store the GC ID. 
>>>>>> Instead I've decided to go back to storing the current GC ID in 
>>>>>> the thread. That way the G1 concurrent marking and the 
>>>>>> young/mixed GCs will have their own GC IDs.
>>>>>> Here's an updated webrev:
>>>>>> http://cr.openjdk.java.net/~brutisso/8134953/webrev.03/
>>>>> Bengt,
>>>>> The _gc_id is needed in a JavaThread because a JavaThread
>>>>> might do work for G1 (concurrent refinement for example) and
>>>>> needs a GC ID?
>>>> The Java threads currently don't log anything when they do GC work. 
>>>> But in the future it is likely that they will.
>>>>> But a WatcherThread would never need one,
>>>>> right?
>>>> Right. Initially I only added the _gc_id field to NamedThread. But 
>>>> Per thought it would be better to have it in Thread. Both for 
>>>> making it possible for future improvements (such as logging GC work 
>>>> from JavaThreads) and because it seems like we normally keep 
>>>> storage in the Thread class even for things that are not used by 
>>>> all subclasses. For example TLABs and allocated_bytes() are only 
>>>> relevant for JavaThreads but have their storage in Thread.
>>> Yes, Thread could be improved but I'm not sure I can stand adding GC 
>>> data where
>>> it's not needed.   Unless GC logging from JavaThread is something 
>>> that's going to
>>> happen soon, I'd rather add it when it's needed rather than right 
>>> now.   I'd personally
>>> rather duplicate the code in JavaThread if it is needed rather than 
>>> have WatcherThreads
>>> have a _gc_id.
>> I tend to agree. I'll discuss this with Per tomorrow. If he agrees 
>> that we can move the _gc_id down to NamedThread, are you ok with me 
>> pushing the fix?
> Yes,  I'm good with the fix then.

Thanks Jon!

I talked to Per and moved the _gc_id field down to NamedThread. Pushing 
it now.

> Would it be appropriate for me to file a CR to change NamedThread to 
> something
> like GCThread?  Or ???

I don't think GCThread is the right name since VMThread is a NamedThread 
and it can schedule tasks that are not GC related. I agree that 
NamedThread is not a great name. But I actually think that the naming in 
the Thread hierarchy is just part of the problem. The real problem is 
that the division of responsibility is not really correctly partitioned 
among the classes.

If we should file a RFE I think it should be to clean up the Thread 
hierarchy, including finding good names for the classes.


> Jon
>> Thanks,
>> Bengt
>>> Jon
>>>> Thanks,
>>>> Bengt
>>>>> Jon
>>>>>> This is the same as webrev.01 that both of you already looked at. 
>>>>>> I've made two minor adjustments:
>>>>>> - I removed the currentNamedthread() function in gcId.cpp that 
>>>>>> Per pointed out in his review of webrev.01. Instead I use 
>>>>>> Thread::current() directly.
>>>>>> - I made GCIdMark a StackObj.
>>>>>> Otherwise the patch is the same as in webrev.01.
>>>>>> Thanks,
>>>>>> Bengt
>>>>>> On 2015-09-10 14:37, Bengt Rutisson wrote:
>>>>>>> Hi Jon,
>>>>>>> Thanks for looking at this!
>>>>>>> On 2015-09-10 01:00, Jon Masamitsu wrote:
>>>>>>>> Bengt,
>>>>>>>> When a CMS concurrent collection hands off to a STW foreground 
>>>>>>>> collection,
>>>>>>>> (done in acquire_control_and_collect()), I expected a new GCId 
>>>>>>>> would be
>>>>>>>> used.  Did I miss it?   That STW collection does not go through 
>>>>>>>> do_collection().
>>>>>>> The call to acquire_control_and_collect() originates from 
>>>>>>> GenCollectedHeap::collect_generation() and there is a new 
>>>>>>> GCIdMark in there that will create a new GCId for the STW 
>>>>>>> collection. That's the "extra" GCIdMark that Per asked about in 
>>>>>>> his review.
>>>>>>> *But* I wanted to verify that it worked properly since you asked 
>>>>>>> about it and I realized that there is another bug.
>>>>>>> The GCId that is set up for the concurrent cycle, the one set up 
>>>>>>> in ConcurrentMarkSweepThread::run(), will is still active over 
>>>>>>> the complete STW foreground collection. That's fine in my model 
>>>>>>> since the new GCIdMark in GenCollectedHeap::collect_generation() 
>>>>>>> will cache that GCId. But the ConcurrentMarkSweep thread is not 
>>>>>>> completely idle even though control has been left to the 
>>>>>>> foreground collection in acquire_control_and_collect(). So, 
>>>>>>> there is some logging done (by fore example CMSPhaseAccounting) 
>>>>>>> that is done at the same time as the foreground collection. This 
>>>>>>> logging will now use the foreground GCId instead of the CMS GCId.
>>>>>>> I haven't had time to dig in to the details of that just yet. 
>>>>>>> But this is an unintended change of the logging, so I would like 
>>>>>>> to fix it. Hopefully I'll have an updated webrev tomorrow.
>>>>>>> Thanks,
>>>>>>> Bengt
>>>>>>>> Jon
>>>>>>>> On 9/9/2015 7:40 AM, Bengt Rutisson wrote:
>>>>>>>>> Hi again,
>>>>>>>>> I found an issue with how G1 young collections kick off 
>>>>>>>>> concurrent marks. There is a short window where we might have 
>>>>>>>>> two active GC IDs at the same time. I've fixed this and 
>>>>>>>>> updated the webrevs. In case anyone had already started 
>>>>>>>>> looking at the webrevs you need to do a refresh now. The fix 
>>>>>>>>> is in G1CollectedHeap::do_collection_pause_at_safepoint().
>>>>>>>>> Thanks,
>>>>>>>>> Bengt
>>>>>>>>> On 09/09/15 13:38, Bengt Rutisson wrote:
>>>>>>>>>> Hi Per,
>>>>>>>>>> Thanks for looking at this!
>>>>>>>>>> On 2015-09-08 15:23, Per Liden wrote:
>>>>>>>>>>> Hi Bengt,
>>>>>>>>>>> On 2015-09-08 13:35, Bengt Rutisson wrote:
>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>> This is mostly a GC related patch, but it adds a field to 
>>>>>>>>>>>> the Thread
>>>>>>>>>>>> class, so I'm sending this out on the broader hotspot-dev 
>>>>>>>>>>>> list.
>>>>>>>>>>>> Could I have a couple of reviews for this patch?
>>>>>>>>>>>> http://cr.openjdk.java.net/~brutisso/8134953/webrev.01/
>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8134953
>>>>>>>>>>> Looks good. I think this is a nice simplification, 
>>>>>>>>>>> especially for G1's concurrent parts. Just two comments:
>>>>>>>>>>> genCollectedHeap.cpp:
>>>>>>>>>>> ---------------------
>>>>>>>>>>> - GenCollectedHeap::do_collection() has two GCIdMarks, in 
>>>>>>>>>>> different scopes. Do we really want that second mark?
>>>>>>>>>> We potentially do two GCs in 
>>>>>>>>>> GenCollectedHeap::do_collection(). First a young GC and then 
>>>>>>>>>> potentially a full GC. These two should have different GC 
>>>>>>>>>> IDs. So, yes, we need two GCIdMarks in this method. The 
>>>>>>>>>> scoping could be better, but I think that is a refactoring 
>>>>>>>>>> that should be done separately from my current patch. I'll 
>>>>>>>>>> talk to Jesper about it since he has been cleaning up this 
>>>>>>>>>> code lately.
>>>>>>>>>>> gcId.cpp:
>>>>>>>>>>> --------
>>>>>>>>>>> - I think you might have left currentNamedthread() in there. 
>>>>>>>>>>> You probably just want to use Thread::current() instead.
>>>>>>>>>> Yes, good catch. Thanks.
>>>>>>>>>> However, after thinking some more about the implications of 
>>>>>>>>>> moving the GC ID out from the GCTracer I figured it would be 
>>>>>>>>>> possible to just store the previous GC ID in the GCIdMark and 
>>>>>>>>>> then restore it in the destructor of that class. That way we 
>>>>>>>>>> don't have to store anything in the Thread class but can 
>>>>>>>>>> still have both a concurrent GC ID and a STW GC ID active at 
>>>>>>>>>> the same time. This also removes the need to copy the GC ID 
>>>>>>>>>> to the worker tasks.
>>>>>>>>>> Here's an updated webrev with a solution that does not add 
>>>>>>>>>> anything to the Thread class:
>>>>>>>>>> http://cr.openjdk.java.net/~brutisso/8134953/webrev.02/
>>>>>>>>>> And here's a diff compared to the last version:
>>>>>>>>>> http://cr.openjdk.java.net/~brutisso/8134953/webrev.01-02.diff/
>>>>>>>>>> Unfotunately the webrev tool had some bad luck when creating 
>>>>>>>>>> the diff webrev so at least the g1CollectedHeap.cpp seem to 
>>>>>>>>>> contain the complete changes rather than just the diff 
>>>>>>>>>> compared to the 01 version. The rest of the diff looks 
>>>>>>>>>> correct as far as I can tell.
>>>>>>>>>> Thanks,
>>>>>>>>>> Bengt
>>>>>>>>>>> cheers,
>>>>>>>>>>> /Per
>>>>>>>>>>>> Currently the GC ID, that is used for event tracing and 
>>>>>>>>>>>> logging, is
>>>>>>>>>>>> stored in the GCTracer object. That has caused some minor 
>>>>>>>>>>>> problems since
>>>>>>>>>>>> there are multiple GCTracers active at the same time. The 
>>>>>>>>>>>> correct GC IDs
>>>>>>>>>>>> need to be passed around in many places.
>>>>>>>>>>>> For some upcoming GC logging changes I would like to have a 
>>>>>>>>>>>> more
>>>>>>>>>>>> consistent way of finding the currently active GC ID. I've 
>>>>>>>>>>>> played around
>>>>>>>>>>>> with a couple of different solutions, but eventually I 
>>>>>>>>>>>> found that the
>>>>>>>>>>>> simplest solution is to store the current GC ID in the 
>>>>>>>>>>>> thread. That way
>>>>>>>>>>>> there is a single way to look it up in all places. It is 
>>>>>>>>>>>> also fairly
>>>>>>>>>>>> straight forward to set it up.
>>>>>>>>>>>> I've reworked the GCId class a bit to support this and I've 
>>>>>>>>>>>> introduced a
>>>>>>>>>>>> GCIdMark class that is a scoped object that helps to set up 
>>>>>>>>>>>> and tear
>>>>>>>>>>>> down a current GC ID. By moving the GC ID out from the 
>>>>>>>>>>>> GCTracer class I
>>>>>>>>>>>> got rid of a few minor issues as well. One is that we no 
>>>>>>>>>>>> longer need to
>>>>>>>>>>>> keep track of the G1 "aborted concurrent GC ID". That was 
>>>>>>>>>>>> necessary
>>>>>>>>>>>> before since we did logging *after* we had told the 
>>>>>>>>>>>> corresponding
>>>>>>>>>>>> GCTracer that the concurrent cycle was aborted and it had 
>>>>>>>>>>>> thrown its GC
>>>>>>>>>>>> ID away. Now the GC ID can stay in scope until we have 
>>>>>>>>>>>> completed all
>>>>>>>>>>>> logging.
>>>>>>>>>>>> Also the HeapDumpBeforeFullGC and 
>>>>>>>>>>>> PrintClassHistogramBeforeFullGC used
>>>>>>>>>>>> to have to create a new GC ID since their logging was done 
>>>>>>>>>>>> before we had
>>>>>>>>>>>> the correct GCTracer set up. Now the GC ID can be available 
>>>>>>>>>>>> early enough
>>>>>>>>>>>> for this logging to be using the same (and correct) GC ID 
>>>>>>>>>>>> as the rest of
>>>>>>>>>>>> the GC. Same for HeapDumpAfterFullGC and 
>>>>>>>>>>>> PrintClassHistogramAfterFullGC.
>>>>>>>>>>>> I've added an uint to the Thread class to keep track of the 
>>>>>>>>>>>> currently
>>>>>>>>>>>> active GC ID. In the current code there are actually only 
>>>>>>>>>>>> NamedThreads
>>>>>>>>>>>> that need the GC ID. So, I'm not sure where the best place 
>>>>>>>>>>>> is for this
>>>>>>>>>>>> field is. But it seems like the Thread class contains most 
>>>>>>>>>>>> of the data,
>>>>>>>>>>>> even data that is only used by some subclasses, so I opted 
>>>>>>>>>>>> for putting
>>>>>>>>>>>> the field in Thread as opposed to in NamedThread. This 
>>>>>>>>>>>> opens up for
>>>>>>>>>>>> possible future extensions when Java threads may do part of 
>>>>>>>>>>>> the GC work.
>>>>>>>>>>>> However, I'm open to moving the field down to NamedThread 
>>>>>>>>>>>> if that is
>>>>>>>>>>>> seen as better.
>>>>>>>>>>>> In the GCTracer class there were many asserts that were 
>>>>>>>>>>>> checking that
>>>>>>>>>>>> the GC ID was properly set up. Mostly these assert verify 
>>>>>>>>>>>> that start/end
>>>>>>>>>>>> is called correctly and that the other methods are called 
>>>>>>>>>>>> inside of the
>>>>>>>>>>>> start/end scope. Now that the GC ID is moved out of the 
>>>>>>>>>>>> GCTracer class
>>>>>>>>>>>> these asserts had to be dealt with. I figured there were 
>>>>>>>>>>>> three ways to
>>>>>>>>>>>> handle it; just remove them, replace them with check that 
>>>>>>>>>>>> the GC ID from
>>>>>>>>>>>> the current thread is correct, or implement a new status 
>>>>>>>>>>>> field to keep
>>>>>>>>>>>> track of GC start/end state. Personally I'm not sure these 
>>>>>>>>>>>> asserts are
>>>>>>>>>>>> valuable enough to spend time on, so I went for the first 
>>>>>>>>>>>> approach:
>>>>>>>>>>>> removing them. I don't think making them use the thread GC 
>>>>>>>>>>>> ID will be
>>>>>>>>>>>> appropriate. But if anyone feels strongly about these 
>>>>>>>>>>>> asserts I can
>>>>>>>>>>>> implement a separate start/end state.
>>>>>>>>>>>> There are a few "Tasks" in the GC code that are executed by 
>>>>>>>>>>>> worker
>>>>>>>>>>>> threads. To make the worker threads use the correct GC ID I 
>>>>>>>>>>>> make sure
>>>>>>>>>>>> that the Task constructors copy the GC ID from the 
>>>>>>>>>>>> initiating thread
>>>>>>>>>>>> into a local variable. When the task is executed in its 
>>>>>>>>>>>> worker thread it
>>>>>>>>>>>> sets up the GC ID based on the local variable.
>>>>>>>>>>>> The proposed change does not alter any logging (except for 
>>>>>>>>>>>> the small bug
>>>>>>>>>>>> fix for
>>>>>>>>>>>> HeapDumpBefore(After)FullGC/PrintClassHistogramBefore(After)FullGC. 
>>>>>>>>>>>> This
>>>>>>>>>>>> means that no existing tests need to be updated. In 
>>>>>>>>>>>> particular the
>>>>>>>>>>>> test/gc/logging/TestGCId.java test passes even after these 
>>>>>>>>>>>> changes.
>>>>>>>>>>>> A big thanks to Per, who pre-reviewed these changes and 
>>>>>>>>>>>> came with some
>>>>>>>>>>>> really good feedback.
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Bengt

More information about the hotspot-dev mailing list