RFR (M): 8146987: Improve Parallel GC Full GC by caching results of live_words_in_range() [Was: Re: [PATCH] enhancement to ParallelScavenge Full GC]

Thomas Schatzl thomas.schatzl at oracle.com
Fri Jan 15 12:49:53 UTC 2016

On Thu, 2016-01-14 at 12:09 -0800, Jon Masamitsu wrote:
> http://cr.openjdk.java.net/~tschatzl/8146987/webrev.1/src/share/vm/gc
> /parallel/parMarkBitMap.cpp.frames.html
>  143   } else if (end_obj < last_obj) {
>  144     if (pointer_delta((HeapWord*)end_obj, (HeapWord*)beg_addr) >
> pointer_delta((HeapWord*)last_obj, (HeapWord*)end_obj)) {
>  145       last_ret = last_ret -
> live_words_in_range_helper((HeapWord*)end_obj, last_obj);
>  146     } else {
>  147       last_ret = live_words_in_range_helper(beg_addr, end_obj);
>  148     }
> Did you measure the performance improvement afforded by  lines 144 -
> 145?
> The calculation of the new address is used in two cases.  One is when
> the live object is being moved to its new location.  In that case
> I would expect that the overwhelmingly common case would be
> end_obj > last_obj.  The calculation of the new location for a live
> object
> (where live_words_in_range() is used) proceeds from left to right
> (lower
> to higher addresses) as each region is scanned looking for live
> objects.  I would expect the execution of line 145 to be seldom
> if ever, so that just using 147 would be fine.  The other case is
> less clear.   When an object is being moved, the object references
> within it are updated.   That  access pattern seems like it would be
> more
> random to me (fewer cache hits)  but if you have data that shows 
> that line 145 is  beneficial, that would be a good data point.

  I did some measurements on which branches are taken in
live_words_in_range() with SPECjbb2015 with constant IR (why constant
IR? I had that setup and had used that for comparison runs
before/after, see other email) and clamped down adaptive size policy
(basically setting all heap sizes, 10g total heap, ~1g live set).
See the patch here 
https://bugs.openjdk.java.net/secure/attachment/56451/reftypes.diff and
 the results at https://bugs.openjdk.java.net/secure/attachment/56452/r

That graph shows the kind of branching decisions taken during execution
of every full gc. (X-axis is full gc number, y-axis relative branch 
 execution frequency). Labels include the index into the array
containing the values.

Ignore the first and last few full gcs, they are startup and shutdown
related (i.e. system.gc's, whatever).
From this you can see that actually the branch in lines 144-145 is
rather important (yellow - 4), as it catches a large amount of
references that would otherwise need a full call to
live_words_in_range() (turquoise - 5).

A quick run with 144-145 removed, showed that this is indeed a problem
and actually around half of the improvements from this patch are
removed if this condition is removed. So I would opt to keep it :)

With the usual disclaimer of that being only a single data point.


More information about the hotspot-gc-dev mailing list