RFR (M): 8013895 G1: G1SummarizeRSetStats output on Linux needs improvement

Thomas Schatzl thomas.schatzl at oracle.com
Wed May 15 17:23:37 UTC 2013

Hi all,

  I'd like to add some further bug fix to this CR, as it only concerns a
single line.

In the original code to retrieve remembered set statistics of the
regions, when iterating over the heap, the code excluded regions that
continued humonguous regions.

This results in totals, as while the continued humonguous regions do not
have remembered set entries, they have a separate remembered set
instance. So when skipping them, the size of the empty remembered sets
are missing in the totals. (Note that an even an empty remembered set
needs a few k of heap).

The update webrev at

fixes that. In HRRSStatsIter::doHeapRegion() it removes the first line
of the implementation that skips continued humonguous regions (i.e. the
line "if (r->continuesHumongous()) return false;").

I verified by testing that the remembered set instances of continued
humonguous regions are different than the remembered set instance of the
humonguous starting region - which would have explained this code.

I reran JPRT and the jtreg tests successfully.

The rest of the information about this change still applies:

On Mon, 2013-05-13 at 23:25 +0200, Thomas Schatzl wrote:
> Hi all,
>   I'd like to ask you for reviews for this change.
> It improves the output of G1SummarizeRSetStats on all platforms.
> The previous output of G1SummarizeRSetStats has the following issues as shown in the example output from the CR:
>  Concurrent RS processed -1761455529 cards 
>   Of 9224262 completed buffers: 
>       9224262 (100.0%) by conc RS threads. 
>             0 ( 0.0%) by mutator threads. 
>   Conc RS threads times(s) 
>           0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
>   Total heap region rem set sizes = 1120947K. Max = 4047K. 
>   Static structures = 600K, free_lists = 5312K. 
>     717820015 occupied cards represented. 
>     Max size region = 378:(O)[0x00000004fa000000,0x00000004fb000000,0x00000004fb000000], size = 4048K, occupied = 37303K. 
>     Did 209966 coarsenings. 
> "Concurrent RS processed" uses the wrong data type, overflowing easily.
> "Conc RS threads times" are always zero except on Solaris
> "Conc RS threads times" reports the young gen sampling thread as remembered set thread.
> The data for everything except for the remembered set sizes (beginning
> at "Total heap region..." is always cumulative, which is hard to
> read/evaluate.
> This change includes the following modifications:
> - use a larger data type for the number of concurrent RS processed
> - division by zeros when calculating percentages for the completed
> buffers when the number of completed buffers is still zero
> - previously printed the summary only for young gen gcs, now also
> printing the summary at full gcs
> - split out the young gen sampling thread from the conc RS threads times
> - fixed the missing space between "times" and "(s)" in the same line
> - changed slightly how the G1SummarizeRSetStats
> G1SummarizeRSetStatsPeriod flags work. If G1SummarizeRSetStats is true,
> always print a cumulative summary at VM exit; if
> G1SummarizeRSetStatsPeriod is > 0 (default), print a cumulative summary
> for the last period GCs; i.e. if G1SummarizeRSetStatsPeriod is 1, print
> the summary for the most recent GC.
> - fix gathering of thread times:
>    - on linux supporting getrusage(RUSAGE_THREAD, ...), use this
> information, otherwise fallback to the elapsed time as suggested in the
> CR
>    - on Windows, use the GetThreadTimes() system API call, fallback to
> elapsed time
>    - on BSD, use the elapsed time counter as I do not know an API call.
> (getrusage() does not seem to support RUSAGE_THREAD there)
> Here is an example output for printing the summary for a different
> application on a different machine:
>  Cumulative RS summary
>  Concurrent RS processed 1504863 cards
>   Of 7194 completed buffers:
>          7194 (100.0%) by concurrent RS threads.
>             0 (  0.0%) by mutator threads.
>   Concurrent RS threads times (s)
>           1.81
>   Concurrent sampling threads times (s)
>           0.10
>   Total heap region rem set sizes = 119K.  Max = 4K.
>   Static structures = 7K, free_lists = 209K.
>     23253 occupied cards represented.
>     Max size region =
> 0:(O)[0x00000000c2200000,0x00000000c22fffe0,0x00000000c2300000], size =
> 5K, occupied = 5K.
>     Did 0 coarsenings.
> - the "Cumulative RS summary" header is only for the final printout at
> the end of the GC
> - separated out "Concurrent RS threads times" and "Concurrent sampling
> threads times"
> The remembered set information is always for the current situation and
> never cumulative in any way as before. There does not seem to be a
> useful way to make it cumulative...
> Guide to read the patch
> - I moved the remembered set summary code to a separate class
> G1RemSetSummary, and added new files (g1RemSetSummary.?pp) for it.
> - the main change has been about managing the periodicity, as G1 only
> ever stores the accumulated data. For this reason, G1RemSet keeps a
> local instance of G1RemSetSummary with the values of the last printout.
> When it is time to create a new periodic output, a new temporary
> G1RemSetSummary is created and filled with the current values from the
> various sources, and the last values subtracted from it.
> - the various os_* files contain the changes to return thread timing
> values
> - changes in G1ConcurrentRefine are mostly about some helper functions
> to distinguish between actual worker and the RS sampling thread.
> - g1CollectedHeap.cpp contains the change that potentially prints the
> summary at every gc, not only at young gc
> - the test case verifies how many times the remset summary is actually
> printed for a few configurations of G1SummarizeRSetStats and
> G1SummarizeRSetStatsPeriod.
> Webrev
> http://cr.openjdk.java.net/~tschatzl/8013895/webrev
> Bugs.sun.com
> http://bugs.sun.com/view_bug.do?bug_id=8013895
> Jira:
> https://jbs.oracle.com/bugs/browse/JDK-8013895
> Testing:
> jprt, test case, manual testing of getrusage(RUSAGE_THREAD, ) and
> fallback on Linux x64, manual testing of GetThreadTimes() on Windows 64
> bit, no special testing on BSD/OS X 10.7 except noticing that
> getrusage(RUSAGE_THREAD) is not available there.

Thanks in advance,

More information about the hotspot-gc-dev mailing list