JEP 248: Make G1 the Default Garbage Collector

Kirk Pepperdine kirk.pepperdine at gmail.com
Mon Jun 1 16:26:40 UTC 2015


> 
> 
> That's a fair point, and one I'd be interesting in hearing an answer to as
> well.  FWIW, the only GC I know of that's actually used in low latency
> systems is Azul's C4, so I'm not even sure Oracle is trying to target the
> same use cases.  So when we talk about "low latency" GCs, we should
> probably also be clear on what "low" actually means.

C4 certainly pauses less but I’m finding that people are hitting performance walls with Azul’s VM that they aren’t hitting with OpenJDK. They want less pause with OpenJDK throughput. In that regard I see some very very odd configurations being used in low latency applications that I have yet to be able to come close to replicating using the G1. Case in mind is an app that I’m currently working with. The bulk of the pause times are between 15 to 30ms. Target is < 2ms. Azul is not an option in this case. I’m convinced (not enough data yet) that the problem is RSet processing as CPU use is very highly aligned with live set size in tenured. However the tuning parameters do not allow to vary the aggressiveness of the collector to allow it to knock down the size of the “live set” in tenured. Sorry, no more time for details as I need to run for a plane. But in short, I don’t believe we have the tools to yet allow us to tune apps that aren’t even this extreme in their pause time requirements.

Regards,
Kirk

> 
> Vitaly
> 
> 
> On Mon, Jun 1, 2015 at 10:42 AM, Ben Evans <ben at jclarity.com> wrote:
> 
>> Hi Vitaly,
>> 
>> (I've added hotspot-dev back on to the To: line as I think it's
>> important this discussion is had in public).
>> 
>> In general, Mark has outlined a design philosophy for the platform
>> that is conservative, and where, if features are not ready, then they
>> are slipped to the next major release. Features shouldn't be rushed or
>> releases delayed, instead production quality features should be
>> shipped when done.
>> 
>> So, to my mind, this issue comes down to whether the proposed benefit
>> is such that it outweighs the risks of changing the behaviour of
>> millions upon millions of installations. We don't have any systematic
>> data (which I argue should be a huge red flag in itself), and the
>> experience of consultants and performance engineers, including Kirk
>> and myself, is not exactly encouraging. So, does this change really
>> justify the risk?
>> 
>> I would also question the conclusion that all we can organise before
>> Java 10 is: "some reports from the field". For Java 8, the community
>> was able to engage with a pretty good group of F/OSS libraries & help
>> them to test on betas of 8, so they (& their users) could have
>> confidence that they would "just work" with 8 straight out of the box.
>> 
>> I see no reason why a similar approach could not work for G1 becoming
>> default - we can approach relevant partners in the ecosystem (e.g.
>> Cloudbees, Blazemeter, etc) and see if they can help, and we can
>> directly reach out and get people testing with G1. However, there is
>> an issue of timing and available resources here - there's a lot going
>> on for JDK 9 as it is, and I don't know how easy it would be to get
>> this programme running as well.
>> 
>> Finally, the other issue that I'd like to address is that of scope
>> creep. I'd always been under the impression that G1 was thought of as
>> the CMS replacement. However, (and admittedly a lot of the systems I
>> see are either financial or gaming) in its current state there is no
>> way that G1 is a general replacement for CMS. The pauses for G1 are
>> simply too long for a big class of low-latency systems.
>> 
>> Instead, G1 is now being talked of as a replacement for the default
>> collector. If that's the case, then I think we need to acknowledge it,
>> and have a conversation about where G1 is actually supposed to be
>> used. Are we saying we want a "reasonably high throughput with reduced
>> STW, but not low pause time" collector? If we are, that's fine, but
>> that's not where we started.
>> 
>> Thanks,
>> 
>> Ben
>> 
>> On Mon, Jun 1, 2015 at 3:05 PM, Vitaly Davidovich <vitalyd at gmail.com>
>> wrote:
>>> Kirk,
>>> 
>>> I don't dispute that some people aren't tuning/touching the GC controls,
>> and
>>> may get negatively impacted (but perhaps positively too).  My main point,
>>> however, is I don't see waiting until java 10 as adding sufficient safety
>>> guards; certainly there will be more lab time and benchmarking at oracle,
>>> some reports from the field but inevitably there will be unknown
>> workloads
>>> in the wild that still don't work well even after more "due diligence".
>> If
>>> G1 is truly the successor to CMS, kicking the can further down the road
>>> isn't helping achieve that.  Anyone seeing a regression has an easy way
>> to
>>> opt out.  Any such change will always weed out some outliers, java 9, 10
>> or
>>> 15.  The longer we wait, the harder it may be to fix some of them.
>>> 
>>> sent from my phone
>>> 
>>> On Jun 1, 2015 9:43 AM, "Kirk Pepperdine" <kirk at kodewerk.com> wrote:
>>>> 
>>>> Hi Vitaly,
>>>> 
>>>> Ben has only re-iterated what I’ve already said but in a more concise
>> way.
>>>> And, I don’t mean to be insulting but I don’t really buy into the
>> argument
>>>> that people will be specifying a collector anyways because there are
>> still a
>>>> significant number that use the parallel collector. In fact, just
>> today, I
>>>> recommended that someone move away from G1 to the parallel collector as
>> that
>>>> use case clearly favored the recommendation.
>>>> 
>>>> And I should add, I’ve now backed a number of deployments off of
>>>> tiered-compilation as IME it is impacting performance in a negative way.
>>>> 
>>>> Regards,
>>>> Kirk
>>>> 
>>>> On Jun 1, 2015, at 3:05 PM, Vitaly Davidovich <vitalyd at gmail.com>
>> wrote:
>>>> 
>>>>> Ben,
>>>>> 
>>>>> The customers using CMS won't be impacted since they're explicitly
>>>>> specifying the GC.  Java 9 will already require extensive testing for
>>>>> people, and GC performance is luckily one of the more introspectable
>>>>> facilities.  Furthermore, people who are keen on staying with the
>>>>> default
>>>>> collector should/can lock that in before moving to Java 9 since
>>>>> presumably
>>>>> there will be enough visibility of this change in release notes and
>>>>> such.
>>>>> 
>>>>> Personally, I find changing default JIT compilation policy to tiered
>> in
>>>>> java 8 a more risky change, but I don't recall seeing such fervor
>> around
>>>>> it
>>>>> :).
>>>>> 
>>>>> sent from my phone
>>>>> On Jun 1, 2015 6:37 AM, "Ben Evans" <ben at jclarity.com> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I'm somewhat late to this, having missed the original discussion
>>>>>> whilst travelling.
>>>>>> 
>>>>>> Mark targeted this JEP to JDK 9 but has since put that on hold to
>>>>>> allow more discussion.
>>>>>> 
>>>>>> I made this comment to Mark on jdk9-dev:
>>>>>> 
>>>>>> "I have been working with G1 for ~5 years, ever since it was
>>>>>> experimental (& highly crash-prone in JDK 6).
>>>>>> 
>>>>>> In the intervening time, I have seen dozens (if not hundreds) of
>>>>>> installations, across a wide range of customers. I have participated
>>>>>> in, or been consulted on at least a dozen direct trials of GC
>>>>>> alternatives.
>>>>>> 
>>>>>> It is only in the last 18 months that I have seen *any* real-life
>>>>>> workload on G1 beat the alternatives, and only in the last 12 months
>>>>>> that I've had any customer prepared to go live with G1 in production.
>>>>>> 
>>>>>> From my experience, I think that G1 is a fine collector, with a
>> bright
>>>>>> future that should be pursued. However, I haven't seen anything that
>>>>>> would make a switch to it as default collector seem compelling in the
>>>>>> JDK 9 timeframe.
>>>>>> 
>>>>>> Obviously, my experience is not universal, so I'd like to ask you /
>>>>>> Oracle:
>>>>>> 
>>>>>> 1) Can you explain the survey methodology and customer testing that
>>>>>> you performed to arrive at the conclusion that G1 is ready to become
>>>>>> default?
>>>>>> 
>>>>>> 2) Can you share aggregate results of the surveying ("We worked with
>> X
>>>>>> customers and ran Y tests of G1 vs alternatives, and in Z% of cases,
>>>>>> G1 worked better by W margin")?
>>>>>> 
>>>>>> 3) Can you ask some of the customers you worked with to speak
>> publicly
>>>>>> about the trials you ran with them?"
>>>>>> 
>>>>>> From reading this thread, am I right to conclude that no formal study
>>>>>> of this issue has been done?
>>>>>> 
>>>>>> If that's the case, then are we really happy to make G1 default
>>>>>> without some more systematic efforts and attempts to obtain actual
>>>>>> numbers?
>>>>>> 
>>>>>> The questions that I'd like to see answered are:
>>>>>> 
>>>>>> a) How short a pause time can G1 support being tuned to? 50ms? 20?
>>>>>> Personally, I haven't seen it getting close to CMS in terms of STW
>>>>>> time.
>>>>>> 
>>>>>> b) What is the impact on throughput due to G1?
>>>>>> 
>>>>>> I do like G1 as a collector, but can we really organise enough field
>>>>>> tests in the pre-9 timeframe to justify such a large and potentially
>>>>>> breaking change? We managed to do some good community compatibility
>>>>>> testing for JDK 8, and we could think about a similar effort for
>> "make
>>>>>> G1 default". However, with modules, HTTP/2 and JShell all happening
>>>>>> for 9, I question whether there is simply enough community bandwidth
>>>>>> to do a decent effort for G1 as well, whereas, if we were targeting
>>>>>> JDK 10 we'd have a lot more time to plan and to try to improve the
>>>>>> quality and range of the field data to hopefully de-risk a potential
>>>>>> large, high-profile failure.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Ben
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Thu, Apr 30, 2015 at 2:55 PM, Monica Beckwith
>>>>>> <monica at beckwithclan.com> wrote:
>>>>>>> I am also FOR the change in the default GC. Charlie and Mattis bring
>>>>>>> up
>>>>>>> great points. It's about time G1 gets put out there (as the default
>>>>>>> GC)
>>>>>>> since most of the development work is going into G1. As for
>>>>>> documentation,
>>>>>>> we not only need to document the change in the default collector but
>>>>>>> also
>>>>>>> the defaults for the collector; that are enabled as soon as G1 is
>>>>>> employed -
>>>>>>> e.g. MaxGCPauseMillis, IHOP, etc.
>>>>>>> 
>>>>>>> With more and more input coming in, G1 is only going to get better
>> and
>>>>>>> hopefully more adaptive :)
>>>>>>> 
>>>>>>> And as for Charlie's question - I don't remember the last time that
>> I
>>>>>> didn't
>>>>>>> see an explicit GC mentioned on the command line (even if it was the
>>>>>> default
>>>>>>> GC).
>>>>>>> 
>>>>>>> These are just my two cents.
>>>>>>> 
>>>>>>> -Monica
>>>>>>> 
>>>>>>> 
>>>>>>> On 4/30/15 8:17 AM, charlie hunt wrote:
>>>>>>>> 
>>>>>>>> Fwiw, we should not forget that anyone who is currently specifying
>> an
>>>>>>>> explicit GC to use in his or her JVM command line args will not
>>>>>> experience
>>>>>>>> any difference in behavior. They will still get the collector they
>>>>>> specify
>>>>>>>> to use. The (potential) impact will be on those who do not specify
>> a
>>>>>>>> GC
>>>>>> to
>>>>>>>> use.
>>>>>>>> 
>>>>>>>> What I would like to hear from Kirk and others who frequently work
>>>>>>>> with
>>>>>>>> customers on GC, what’s the percentage of Java applications they
>> have
>>>>>> worked
>>>>>>>> with that do not explicitly specify a GC?  And, of those, what
>>>>>> percentage of
>>>>>>>> those apps fall into the categories of small heap and desire low
>>>>>> latency, or
>>>>>>>> desire high throughput even at the cost of frequent full GCs?
>>>>>>>> 
>>>>>>>> thanks,
>>>>>>>> 
>>>>>>>> charlie
>>>>>>>> 
>>>>>>>>> On Apr 30, 2015, at 7:27 AM, Mattis Castegren
>>>>>>>>> <mattis.castegren at oracle.com> wrote:
>>>>>>>>> 
>>>>>>>>> Hi.
>>>>>>>>> 
>>>>>>>>> I also work with customers but I would like to give an argument
>> FOR
>>>>>>>>> changing the default.
>>>>>>>>> 
>>>>>>>>> I don't think we will ever come to a point where G1 is better for
>>>>>>>>> ALL
>>>>>>>>> users. Even with a near perfect G1 implementation there may be
>> cases
>>>>>> where
>>>>>>>>> the parallel collector gives better throughput.
>>>>>>>>> 
>>>>>>>>> Right now, I think G1 will be better for most users. There are
>>>>>>>>> probably
>>>>>>>>> also corner cases where G1 COULD be better, but where small issues
>>>>>> reduces
>>>>>>>>> performance. By changing the default to G1, we will be able to
>>>>>>>>> easier
>>>>>> find
>>>>>>>>> these as we will expose more users to G1.
>>>>>>>>> 
>>>>>>>>> Finally, there will be a set of users who only care about
>>>>>>>>> throughput,
>>>>>> and
>>>>>>>>> who will see a performance regression. In those cases, they can go
>>>>>> back to
>>>>>>>>> using parallel. But hopefully, there will be far fewer users who
>>>>>>>>> need
>>>>>> to
>>>>>>>>> tune their application to run with parallel GC than there are
>> users
>>>>>> who have
>>>>>>>>> to (or should) tune their application to run with G1.
>>>>>>>>> 
>>>>>>>>> In the case of huge, business critical, applications, we will
>> always
>>>>>>>>> introduce a risk by changing default collectors. This is true if
>> we
>>>>>> change
>>>>>>>>> to G1 in JDK 9, 10 or 11. I prefer to just rip the band aid off.
>> We
>>>>>> know
>>>>>>>>> that the collector we will focus on going forward is G1, so we
>>>>>>>>> should
>>>>>> let as
>>>>>>>>> many people use it as possible.
>>>>>>>>> 
>>>>>>>>> Of course we should document this a lot, so that users who go up
>> to
>>>>>> JDK 9
>>>>>>>>> and see performance regressions can at least try to run with
>>>>>>>>> Parallel
>>>>>> to see
>>>>>>>>> if it is due to the GC.
>>>>>>>>> 
>>>>>>>>> Kind Regards
>>>>>>>>> /Mattis
>>>>>>>>> 
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Kirk Pepperdine [mailto:kirk at kodewerk.com]
>>>>>>>>> Sent: den 30 april 2015 13:18
>>>>>>>>> To: Stefan Johansson
>>>>>>>>> Cc: hotspot-dev at openjdk.java.net Source Developers
>>>>>>>>> Subject: Re: JEP 248: Make G1 the Default Garbage Collector
>>>>>>>>> 
>>>>>>>>> Hi Stefan,
>>>>>>>>> 
>>>>>>>>> Indeed, the improvements have been amazing. I have been getting
>> many
>>>>>>>>> clients to bench with it and although the results have been mixed,
>>>>>> overall
>>>>>>>>> many have been able to move forward. However I still would not
>>>>>> recommend G1
>>>>>>>>> to anyone who can't move to 1.8.0_40. Of course this change will
>>>>>> obviously
>>>>>>>>> come post _40 but still, the recent emergence of the G1 as a
>> viable
>>>>>>>>> production ready collector suggests that making it a default
>> maybe a
>>>>>> wee bit
>>>>>>>>> optimistic.
>>>>>>>>> 
>>>>>>>>> The change is based on the assumption that limiting latency is
>> often
>>>>>> more
>>>>>>>>> important than maximizing throughput. If this assumption is
>>>>>>>>> incorrect
>>>>>> then
>>>>>>>>> this change might need to be reconsidered.
>>>>>>>>> 
>>>>>>>>> I would agree with this assumption. In most cases latency is more
>>>>>>>>> important. However G1 doesn't always provide lowest latency
>>>>>>>>> especially
>>>>>> in
>>>>>>>>> smaller heaps.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> G1 is seen as a robust and well-tested collector. It is not
>> expected
>>>>>>>>> to
>>>>>>>>> have stability problems, but becoming the default collector will
>>>>>> increase
>>>>>>>>> its visibility and may reveal previously-unknown issues.
>>>>>>>>> I not sure it's prudent to treat the entire Java eco-system as
>>>>>>>>> guinea
>>>>>>>>> pigs. I believe it's more prudent to have the willing take that
>>>>>>>>> first
>>>>>> step
>>>>>>>>> rather than have it unwittingly dropped on everyone
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> At the end of the day, I don't have any say in any of this (as it
>>>>>> should
>>>>>>>>> be). All I can do is let you know what I'm seeing through my straw
>>>>>> with the
>>>>>>>>> hope that you'll find the information useful. From what I see,
>> there
>>>>>> is not
>>>>>>>>> nearly enough experience in the tuning the G1 in that is
>> especially
>>>>>> true in
>>>>>>>>> the general population to make this type of change at this point
>> in
>>>>>> time.
>>>>>>>>> I'm also not sure that we have all the tuning options we need to
>>>>>>>>> ensure
>>>>>>>>> "happy apps" in the wild. For example, I think the incremental
>>>>>> accumulated
>>>>>>>>> waste in tenured regions is a problem that I'm not sure we have
>> the
>>>>>> tools to
>>>>>>>>> solve. I'm not even sure if it's a recognized problem. In fact I'm
>>>>>>>>> not
>>>>>> even
>>>>>>>>> sure it's a real problem as at the moment it's only a theory based
>>>>>>>>> on
>>>>>>>>> observations I'm making by looking at numbers of GC logs produced
>> by
>>>>>>>>> applications using recent releases of the G1.
>>>>>>>>> 
>>>>>>>>> I would suggest that for Tiered the default config for 8 is was
>> also
>>>>>>>>> a
>>>>>>>>> bit premature. I've had to have a number of clients have to roll
>>>>>>>>> back
>>>>>> off of
>>>>>>>>> it.
>>>>>>>>> 
>>>>>>>>> - Kirk
>>>>>>>>> 
>>>>>>>>> On Apr 29, 2015, at 3:03 PM, Stefan Johansson
>>>>>>>>> <stefan.johansson at oracle.com> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi Kirk,
>>>>>>>>>> 
>>>>>>>>>> A lot of effort is put into G1, it has been continuously
>> improving
>>>>>> over
>>>>>>>>>> the last couple of years and we now believe that G1 is ready to
>>>>>> become the
>>>>>>>>>> default. G1 will not improve all use case, but the same is true
>> for
>>>>>> the
>>>>>>>>>> other collectors. For users where throughput is the main concern,
>>>>>> Parallel
>>>>>>>>>> GC can still be used by specifying -XX:+UseParallelGC on the
>>>>>> command-line.
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Stefan
>>>>>>>>>> 
>>>>>>>>>> On 2015-04-29 09:10, Kirk Pepperdine wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi all,
>>>>>>>>>>> 
>>>>>>>>>>> Is the G1 ready for this? I see many people moving to G1 but
>> also
>>>>>>>>>>> I'm
>>>>>>>>>>> not sure that we've got the tunable correct. I've been sorting
>>>>>> through a
>>>>>>>>>>> number of recent tuning engagements and my  conclusion is that I
>>>>>> would like
>>>>>>>>>>> the collector to be aggressive about collecting tenured regions
>> at
>>>>>> the
>>>>>>>>>>> beginning of a JVM's life time but then become less aggressive
>>>>>>>>>>> over
>>>>>> time.
>>>>>>>>>>> The reason is the residual waste that I see left behind because
>>>>>> certain
>>>>>>>>>>> regions never hit the threshold needed to be included in the
>> CSET.
>>>>>> But, on
>>>>>>>>>>> aggregate, the number of regions in this state does start to
>>>>>>>>>>> retain a
>>>>>>>>>>> significant about of dead data. The only way to see the effects
>> is
>>>>>> to run
>>>>>>>>>>> regular Full GCs.. which of course you don't really want to do.
>>>>>> However, the
>>>>>>>>>>> problem seems to settle down a wee bit over time which is why I
>>>>>>>>>>> was
>>>>>> thinking
>>>>>>>>>>> that being aggressive about what is collected in the early
>> stages
>>>>>>>>>>> of
>>>>>> a JVMs
>>>>>>>>>>> life should lead to better packing and hence less waste.
>>>>>>>>>>> 
>>>>>>>>>>> Note, I don't really care about the memory waste, only it's
>> effect
>>>>>>>>>>> on
>>>>>>>>>>> cycle frequencies and pause times.
>>>>>>>>>>> 
>>>>>>>>>>> Sorry but I don't have anything formal about this as I (and I
>>>>>>>>>>> believe
>>>>>>>>>>> many others) are still sorting out what to make of the G1 in
>> prod.
>>>>>> Generally
>>>>>>>>>>> the overall results are good but sometimes it's not that way up
>>>>>> front and
>>>>>>>>>>> how to improve things is sometimes challenging.
>>>>>>>>>>> 
>>>>>>>>>>> On a side note, the move to Tiered in 8 has also caused a bit of
>>>>>> grief.
>>>>>>>>>>> Metaspace has caused a bit of grief and even parallelStream,
>> which
>>>>>> works,
>>>>>>>>>>> has come with some interesting side effect. Everyone has been so
>>>>>> enamored
>>>>>>>>>>> with Lambdas (rightfully so) that the other stuff has been
>>>>>>>>>>> completely
>>>>>>>>>>> forgotten and some of it has surprised people. I guess I'll be
>>>>>> submitting a
>>>>>>>>>>> talk for J1 on some of the field experience I've had with the
>>>>>>>>>>> other
>>>>>> stuff.
>>>>>>>>>>> 
>>>>>>>>>>> Regards,
>>>>>>>>>>> Kirk
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Apr 28, 2015, at 11:02 PM, mark.reinhold at oracle.com wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> New JEP Candidate: http://openjdk.java.net/jeps/248
>>>>>>>>>>>> 
>>>>>>>>>>>> - Mark
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Ben Evans, Co-founder jClarity @jclarity
>>>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Ben Evans, Co-founder jClarity @jclarity
>> 



More information about the hotspot-gc-dev mailing list