RFR (M): 8013895 G1: G1SummarizeRSetStats output on Linux needs improvement
john.cuthbertson at oracle.com
Fri May 24 16:19:05 UTC 2013
I'll look at it today.
On 5/24/2013 12:47 AM, Thomas Schatzl wrote:
> could any Reviewer for this change have a look - I have a review from
> Thanks a lot,
> On Wed, 2013-05-15 at 19:23 +0200, Thomas Schatzl wrote:
>> Hi all,
>> I'd like to add some further bug fix to this CR, as it only concerns a
>> single line.
>> In the original code to retrieve remembered set statistics of the
>> regions, when iterating over the heap, the code excluded regions that
>> continued humonguous regions.
>> This results in totals, as while the continued humonguous regions do not
>> have remembered set entries, they have a separate remembered set
>> instance. So when skipping them, the size of the empty remembered sets
>> are missing in the totals. (Note that an even an empty remembered set
>> needs a few k of heap).
>> The update webrev at
>> fixes that. In HRRSStatsIter::doHeapRegion() it removes the first line
>> of the implementation that skips continued humonguous regions (i.e. the
>> line "if (r->continuesHumongous()) return false;").
>> I verified by testing that the remembered set instances of continued
>> humonguous regions are different than the remembered set instance of the
>> humonguous starting region - which would have explained this code.
>> I reran JPRT and the jtreg tests successfully.
>> The rest of the information about this change still applies:
>> On Mon, 2013-05-13 at 23:25 +0200, Thomas Schatzl wrote:
>>> Hi all,
>>> I'd like to ask you for reviews for this change.
>>> It improves the output of G1SummarizeRSetStats on all platforms.
>>> The previous output of G1SummarizeRSetStats has the following issues as shown in the example output from the CR:
>>> Concurrent RS processed -1761455529 cards
>>> Of 9224262 completed buffers:
>>> 9224262 (100.0%) by conc RS threads.
>>> 0 ( 0.0%) by mutator threads.
>>> Conc RS threads times(s)
>>> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>>> Total heap region rem set sizes = 1120947K. Max = 4047K.
>>> Static structures = 600K, free_lists = 5312K.
>>> 717820015 occupied cards represented.
>>> Max size region = 378:(O)[0x00000004fa000000,0x00000004fb000000,0x00000004fb000000], size = 4048K, occupied = 37303K.
>>> Did 209966 coarsenings.
>>> "Concurrent RS processed" uses the wrong data type, overflowing easily.
>>> "Conc RS threads times" are always zero except on Solaris
>>> "Conc RS threads times" reports the young gen sampling thread as remembered set thread.
>>> The data for everything except for the remembered set sizes (beginning
>>> at "Total heap region..." is always cumulative, which is hard to
>>> This change includes the following modifications:
>>> - use a larger data type for the number of concurrent RS processed
>>> - division by zeros when calculating percentages for the completed
>>> buffers when the number of completed buffers is still zero
>>> - previously printed the summary only for young gen gcs, now also
>>> printing the summary at full gcs
>>> - split out the young gen sampling thread from the conc RS threads times
>>> - fixed the missing space between "times" and "(s)" in the same line
>>> - changed slightly how the G1SummarizeRSetStats
>>> G1SummarizeRSetStatsPeriod flags work. If G1SummarizeRSetStats is true,
>>> always print a cumulative summary at VM exit; if
>>> G1SummarizeRSetStatsPeriod is > 0 (default), print a cumulative summary
>>> for the last period GCs; i.e. if G1SummarizeRSetStatsPeriod is 1, print
>>> the summary for the most recent GC.
>>> - fix gathering of thread times:
>>> - on linux supporting getrusage(RUSAGE_THREAD, ...), use this
>>> information, otherwise fallback to the elapsed time as suggested in the
>>> - on Windows, use the GetThreadTimes() system API call, fallback to
>>> elapsed time
>>> - on BSD, use the elapsed time counter as I do not know an API call.
>>> (getrusage() does not seem to support RUSAGE_THREAD there)
>>> Here is an example output for printing the summary for a different
>>> application on a different machine:
>>> Cumulative RS summary
>>> Concurrent RS processed 1504863 cards
>>> Of 7194 completed buffers:
>>> 7194 (100.0%) by concurrent RS threads.
>>> 0 ( 0.0%) by mutator threads.
>>> Concurrent RS threads times (s)
>>> Concurrent sampling threads times (s)
>>> Total heap region rem set sizes = 119K. Max = 4K.
>>> Static structures = 7K, free_lists = 209K.
>>> 23253 occupied cards represented.
>>> Max size region =
>>> 0:(O)[0x00000000c2200000,0x00000000c22fffe0,0x00000000c2300000], size =
>>> 5K, occupied = 5K.
>>> Did 0 coarsenings.
>>> - the "Cumulative RS summary" header is only for the final printout at
>>> the end of the GC
>>> - separated out "Concurrent RS threads times" and "Concurrent sampling
>>> threads times"
>>> The remembered set information is always for the current situation and
>>> never cumulative in any way as before. There does not seem to be a
>>> useful way to make it cumulative...
>>> Guide to read the patch
>>> - I moved the remembered set summary code to a separate class
>>> G1RemSetSummary, and added new files (g1RemSetSummary.?pp) for it.
>>> - the main change has been about managing the periodicity, as G1 only
>>> ever stores the accumulated data. For this reason, G1RemSet keeps a
>>> local instance of G1RemSetSummary with the values of the last printout.
>>> When it is time to create a new periodic output, a new temporary
>>> G1RemSetSummary is created and filled with the current values from the
>>> various sources, and the last values subtracted from it.
>>> - the various os_* files contain the changes to return thread timing
>>> - changes in G1ConcurrentRefine are mostly about some helper functions
>>> to distinguish between actual worker and the RS sampling thread.
>>> - g1CollectedHeap.cpp contains the change that potentially prints the
>>> summary at every gc, not only at young gc
>>> - the test case verifies how many times the remset summary is actually
>>> printed for a few configurations of G1SummarizeRSetStats and
>>> jprt, test case, manual testing of getrusage(RUSAGE_THREAD, ) and
>>> fallback on Linux x64, manual testing of GetThreadTimes() on Windows 64
>>> bit, no special testing on BSD/OS X 10.7 except noticing that
>>> getrusage(RUSAGE_THREAD) is not available there.
>> Thanks in advance,
More information about the hotspot-gc-dev