RFR (M): JDK-8068394: Trace event for concurrent GC phases

Bengt Rutisson bengt.rutisson at oracle.com
Wed Dec 2 09:00:08 UTC 2015

Hi Sangheon,

Sorry again for the late reply. Comments inline.

On 2015-11-24 19:48, sangheon.kim wrote:
> Hi Bengt,
> Thank you so much for reviewing this patch!
> On 11/24/2015 05:12 AM, Bengt Rutisson wrote:
>> Hi Sangheon,
>> Sorry for the very late reply to this review request.
>> On 2015-10-22 01:40, sangheon.kim wrote:
>>> Hi all,
>>> Can I get some reviews for this change of adding a trace event for 
>>> concurrent phases on CMS and G1?
>>> Currently we only measure pause times for major phases.
>>> But I want to add 'concurrent mark' and 'concurrent sweep' phases 
>>> for CMS and 'concurrent mark' phase for G1.
>>> To achieve this, I had to change ConcurrentGCTimer and related classes.
>>> This patch includes:
>>> 1) A concurrent phase can be measured only from ConcurrentGCTimer 
>>> and measuring an overlap between concurrent and pause is not allowed 
>>> as currently we don't have that use case. And TimePartitions 
>>> class(or related classes) will be simpler than an overlap allowed case.
>>> 2) I removed PausePhase and ConcurrentPhase which are derived from 
>>> GCPhase because I wanted to avoid heap allocation when adding to 
>>> GrowableArray. Instead introduced 'type' member variable at GCPhase.
>>> CR: https://bugs.openjdk.java.net/browse/JDK-8068394
>>> Webrev: http://cr.openjdk.java.net/~sangheki/8068394/webrev.00/
>> Nice work! It is great to get some timing information for the 
>> concurrent phases.
> Thanks!
>> A few questions/comments:
>> CMS.
>> You have added timing events for the concurrent phases mark and 
>> sweep, but not for other concurrent phases (preclean, 
>> abortable_preclean and reset_concurrent). I think that if you moved 
>> your calls to _gc_timer_cm->register_gc_concurrent_start() and 
>> _gc_timer_cm->register_gc_concurrent_end() into the constructor and 
>> destructor of CMSPhaseAccounting you would automatically get timing 
>> events for all concurrent phases.
> I also considered about this idea but I was not clear whether it is 
> good measurement especially for 'sweep'.
> We are using CMSPhaseAccounting for 5 cases.
> I think for 'mark' and 'preclean' they are okay. (there are 
> non-product functions call before CMSPhaseAccounting. But they are okay).
> For 'abortable-preclean' and 'reset', they are good to have that.
> But when it comes to 'sweep', there are timer and 
> CMSTokenSyncWithLocks related codes and I was not sure about these.
> If you think they are also okay, I will change as you suggested.

I think it is fine for the sweep part as well. If the CMSPhaseAccounting 
instance is not measuring the correct part of the work I think we should 
change it to measure what we want. That way we improve the existing 
measurements too.

But as you stated in your follow up email, I think the sweep code is 
actually pretty much ok from this perspective.

So, I would prefer to go with the CMSPhaseAccounting approach.

>> G1.
>> I think the use of _concurrent_marking_from_roots is unfortunate. It 
>> would be much cleaner if 
>> ConcurrentMark::register_mark_from_roots_phase_end() called 
>> register_gc_concurrent_end() directly. 
> I wanted to avoid introducing the new flag but the reason of it was, 
> as you know, to handle 'abort'.
> When 'abort' happens we do need to end all timers before calling 
> 'GCTracer::report_gc_end()'.
> And at this point we need a way to know whether concurrent timer is 
> started or not.

As I tried to describe I think you can extend the scope of the timing to 
cover the part until the marking thread discovers that the marking was 
aborted. Basically moving the timing in to 

>> I realize that this would change the timing for when a concurrent 
>> marking is aborted. The whole time for a full GC (or even several 
>> full GCs) would be included in the concurrent marking phase. But from 
>> a code perspective that is what happens, so maybe that is the correct 
>> time to report? Also, I think the logging is reported that way so if 
>> we want to make it easy to match the timing events with the logging 
>> we might want to use about the same scope for timing.
>> Why do we only measure the concurrent mark phase for G1? 
> This is good question. :)
> I was comparing CMS and G1 for major concurrent phases.
> And the candidates were 'concurrent mark/sweep (CMS)' and 'concurrent 
> mark(G1)'.
>> Wouldn't it be good to measure all concurrent phases?
> Okay, Stefan Karlsson also pointed to have them as well so I filed a 
> separate CR for this.
> https://bugs.openjdk.java.net/browse/JDK-8143951

Ok. Is there a reason why we want to split this up into two changes? Why 
not add all the concurrent timing events at once?

I'm ok with splitting it up into two changes, but then maybe the title 
for JDK-8068394 should be changed to indicate that it only adds events 
for the marking phase. Also, if we go with the CMSPhaseAccounting 
approach for CMS, you get events for all phases there. So, it will be a 
bit odd to have all concurrent phases handled for CMS but only 
concurrent mark for G1.

>> gcTraceSend.cpp
>>  389   void visit(GCPhase* phase) {
>>  390     switch (phase->type()) {
>>  391       case GCPhase::PausePhaseType:
>>  392         assert(phase->level() < PhasesStack::PHASE_LEVELS, "Need 
>> more event types for PausePhase");
>>  393
>>  394         switch (phase->level()) {
>>  395           case 0: send_phase<EventGCPhasePause>(phase); break;
>>  396           case 1: send_phase<EventGCPhasePauseLevel1>(phase); 
>> break;
>>  397           case 2: send_phase<EventGCPhasePauseLevel2>(phase); 
>> break;
>>  398           case 3: send_phase<EventGCPhasePauseLevel3>(phase); 
>> break;
>>  399           default: /* Ignore sending this phase */ break;
>>  400         }
>>  401         break;
>>  402
>>  403       case GCPhase::ConcurrentPhaseType:
>>  404         assert(phase->level() < 1, "There's only one level for 
>> ConcurrentPhase");
>>  405
>>  406         switch (phase->level()) {
>>  407           case 0: send_phase<EventGCPhaseConcurrent>(phase); break;
>>  408           default: /* Ignore sending this phase */ break;
>>  409         }
>>  410         break;
>>  411     }
>>  412   }
>>  413 };
>> Since there are only two values for GCPhase::PhaseType it seems a bit 
>> odd to use a switch statement. I think I would prefer to factor out 
>> the code for the different types a bit too. So, maybe something like:
>> void visit(GCPhase* phase) {
>>   if (phase->type() == GCPhase::PausePhaseType) {
>>     visit_pause(phase);
>>   } else {
>>     assert(phase->type() == GCPhase::ConcurrentPhaseType, "");
>>     visit_concurrent(phase);
>>   }
>> }
> This seems better. I will fix.

Good. Thanks!


> Thanks,
> Sangheon
>> Thanks,
>> Bengt
>>> Testing: JPRT
>>> Thanks,
>>> Sangheon

More information about the hotspot-gc-dev mailing list