RFR (M): 8136681: Factor out IHOP calculation from G1CollectorPolicy

Thomas Schatzl thomas.schatzl at oracle.com
Tue Nov 17 08:14:43 UTC 2015


On Mon, 2015-11-16 at 11:02 -0800, Jon Masamitsu wrote:
> On 11/16/2015 05:31 AM, Thomas Schatzl wrote:
> > Hi Jon,
> >
> >    thanks a lot for all these reminders for better documentation. I have
> > been working too long on this functionality so that "everything is
> > clear" to me :)
> >
> > New webrevs with hopefully more complete explanations at:
> > http://cr.openjdk.java.net/~tschatzl/8136681/webrev.1_to_2/
> > (incremental)
> > http://cr.openjdk.java.net/~tschatzl/8136681/webrev.2/ (changes)
> >
> >
> > On Fri, 2015-11-13 at 07:58 -0800, Jon Masamitsu wrote:
> >> Thomas,
> >>
> >> This is partial.  If you send out a second webrev based on Mikael's
> >> review, I'll finish with that.
> >>
> >>
> >> http://cr.openjdk.java.net/~tschatzl/8136681/webrev/src/share/vm/gc/g1/g1CollectedHeap.hpp.frames.html
> >>
> >>> 1370 // Returns the number of regions the humongous object of the
> >>> given word size
> >>> 1371 // covers.
> >> "covers" is not quite right since to me it says that the humongous
> >> object completely uses the
> >> region.  I'd use "requires".
> > Fixed.
> >
> >> http://cr.openjdk.java.net/~tschatzl/8136681/webrev/src/share/vm/gc/g1/g1IHOPControl.hpp.html
> >>
> >>     49   // Update information about recent time during which allocations happened,
> >>     50   // how many allocations happened and an additional safety buffer.
> >>
> >> // Update information about
> >>
> >> //   Time during which allocations occurred (sum of mutator execution time + GC pause times)
> >> OR
> >> //   Concurrent marking time (concurrent mark end - concurrent mark start)
> >>
> >> //   Allocations in bytes during that time
> >> //   Safety buffer ???
> >>
> >> I couldn't figure out what the safety buffer is supposed to be.  It seems to
> >> be the young gen size but don't know why.
> > I tried to explain in in the text. In short, in G1 the IHOP value is
> > based on old gen occupancy only. The problem is that the young gen also
> > needs to be allocated somewhere too.
> >
> > Now you could just say, use the maximum young gen size. However this is
> > 60% of the heap by default... so the adaptive IHOP algorithm uses a
> > measure of the young gen that is not bounded by G1ReservePercent.
> >
> > The reason to use the unbounded value is because if the code used the
> > bounded one, it would cancel out with G1ReservePercent, because the
> > closer we get to G1ReservePercent, the smaller that bounded value would
> > get, which would make the current IHOP value rise etc, which would delay
> > the initiation of the marking.
> >
> > That would end up loosing throughput as then the young gen gets smaller
> > and smaller (and GC frequency increases), it can take a long time until
> > G1 gets close enough to G1ReservePercent so that the other factors
> > (allocation rate, marking time) are used.
> >
> > Basically initial mark will be delayed until young gen reaches its
> > minimum size, at which time G1 will continue to use that young gen size
> > until marking occurs. Which means that typically G1 will eat into
> > G1ReservePercent, which we also do not want.
> >
> > Additionally it would get G1 more in trouble in regards to pause time,
> > giving it less room during that time.
> I think I understand the issue of using an unbounded young gen but
> what precisely is meant by "measure of the young gen"?  By measure
> do you mean you used the size of young gen from recent young-only
> collections?

As in "measurement". Yes, this uses the size of the young gen from a
recent young-only collection when the decision whether to start marking
or not occurs.
Which is supposed to be on the large side compared to the ones following
during marking.

> > Unfortunately G1 is in two minds about this, i.e. used() for humongous
> > objects does not contain the "waste" at the end of the las region, but
> > used() for regular regions does.
> So occupancy is not useful and you use free regions + free in the current
> old allocation regions?

I have been referring to that G1's used() for humongous regions seems to
return an incorrect value.

That's why when adding the allocation information for humongous regions
(around line 1018 in g1CollectedHeap.cpp) the change first calculates
the size in regions and multiplies it by full region sizes, instead of
using something like used().

> >> Have you given much thought to what affect a to-space exhaustion should
> >> have on
> >> IHOP?  I understand that it is not the design center for this but I
> >> think that to-space
> >> exhaustion can confuse the statistics.   Maybe a reset of the statistics
> >> and a dropping
> >> the IHOP to a small value (current heap occupancy or even 0) until you
> >> get 3 successful
> >> marking cycles.  Or think about it later.
> > I already thought a little about how this should interact with regards
> > to the calculation: the idea is that basically the algorithm will notice
> > that there is a significant amount of additional allocation, and will
> > lower the IHOP threshold automatically. (Looking through the code I
> > think I saw some problems here, I will see to fix that)
> The increased allocation would be the allocation from compacting
> all the live data into regions during the full GC.  That certainly should

No, from converting survivor/eden regions into old regions. That will
result in a huge bump in allocation rate (if the evac failure has been
serious, i.e. a lot of failed evacuations). Which means that in the
future, the IHOP will be lower.

If the evacuation failure has been not so serious, the impact is
certainly smaller.

The impact of evacuation failures is already much smaller than before,
and there are a few more things that could be done to make it even

The only drawback is that potentially this decrease in IHOP is too small
to avoid the next evac failure/full gc. None of our prediction can
handle long-term cyclic occurrences (like once a day there is a
significant brief 30s spike in promotion rate), so I do not see that as
a particular issue.

That's something that the user needs to tune out at this time (CMS would
not be able to handle this either).

> make the IHOP drop but that seems rather weakly related to the
> actual IHOP needed to avoid promotion failure.   It's hard for me
> to see how that is going to scale (i.e., it seems complicated to
> use something like the live data size as input to the modeling
> of IHOP). I'd start with something really simple but if you're
> comfortable with it, that's fine.
> >
> > If evacuation failure happens in a gc during marking, there are a few
> > options, I have not decided on what's best:
> >
> > - do nothing because the user asked to run an impossible to manage
> > workload (combination of live set, allocation rate, and other options).
> >
> >    - there is already some log output which that information can be
> > derived from.
> >
> > - allow the user to set a maximum IHOP threshold. He could base this
> > value on the log messages he gets.
> >    - the user can already do that by increasing G1ReservePercent btw
> >
> > - make sure that marking completes in time
> >
> >    - let the mutator threads (or during young gcs while marking is
> > running) do some marking if we notice that we do not have enough time.
> > Not sure if it is worth the effort.
> >
> >    - fix some bugs in marking :) that prevent that in extraordinary
> > conditions.

- another option would be to just start doing mixed gcs as G1 is able to
do that without any completed marking (even during marking), hoping that
this will yield enough space.

> >
> >    - make sure that we always start marking early enough by making sure
> > that mixed gc reclaims enough memory. Planning some work here as part of
> > generic work on improving gc policy.
> I like this one above.
> Thanks for the extra explanations.

Thanks for the discussion.


