Unreasonably long ParNew, CMS pause times

Y. S. Ramakrishna y.s.ramakrishna at oracle.com
Wed Jul 21 14:16:28 PDT 2010



On 07/21/10 13:41, Eagle Kessler wrote:
> Checking the configs, it looks like we are explicitly setting MTT to 0:
> 
> # Min, max, total JVM size (-Xms -Xmx)
> JVM_SIZE="-server -Xms2g -Xmx2g"
> 
> # New Generation Sizes (-XX:NewSize -XX:MaxNewSize)
> 
> JVM_SIZE_NEW="-XX:NewSize=512m -XX:MaxNewSize=512m"
> 
> # Perm Generation Sizes (-XX:PermSize -XX:MaxPermSize)
> JVM_SIZE_PERM="-XX:PermSize=128m -XX:MaxPermSize=128m"
> 
> # Type of Garbage Collector to use
> 
> JVM_GC_TYPE="-XX:+UseConcMarkSweepGC -XX:+UseParNewGC"
> 
> JVM_GC_OPTS="-XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=128"
> 
> I agree that in this case we definitely want to be using survivor spaces 
> (I'm unfortunately not in charge of the default GC settings yet). 
> However, I didn't know that running without survivor spaces could cause 
> this kind of behavior. Why does running without survivor spaces cause 
> such a  large performance issue?

See below.

> 
> Regardless, I'll ask that -XX:+PrintTenuringDistribution be added to the 
> configs, along with a non-zero MTT, and see if the issue persists. The 

And of course SurvivorRatio a more reasonable value like 6 or 8,
depending on expected survival rate etc.

> rising ParNew times seem like they would be unrelated to the tenuring 
> threshold, though, wouldn't they?

No, it's related in the sense that MTT=0 was resulting in very very short-lived
data to promote into the old gen. This can cause object dempgraphics (size and
age dirstribution) to be very non-stationary, and confuse the heuristics
for sizing the free list inventory. It of course also places large pressure
on the old gen allocator, increases fragmentation, increases mutation rates
and so on.

The only thing it might not affect much is the initial mark pauses,
which will probably stay as they were before.

-- ramki

> 
> On Wed, Jul 21, 2010 at 12:40 PM, Y. S. Ramakrishna 
> <y.s.ramakrishna at oracle.com <mailto:y.s.ramakrishna at oracle.com>> wrote:
> 
>     oh, and the "(max 0)" seems to imply you have somehow ended up
>     asking for
>     MaxTenuringThreshold=0 which can be disastrous. It is possible that the
>     default settings for CMS on certain very old JVM's did indeed result in
>     MTT=0 (but memory is fading of that era), so if using such older vintage
>     of JVM's, explicitly setting MTT and SurvivorRatio higher is a good
>     idea.
> 
>     -- ramki
> 
> 
>     On 07/21/10 12:34, Y. S. Ramakrishna wrote:
> 
>         Yes, like what i think Matt is getting at, i'd configure
>         sufficiently
>         large survivor spaces. Even if you expect most of your objects
>         to die young,
>         you'd want survivor spaces large enough to keep at least age 1
>         objects in
>         the survivor space. If as you state no medium- to ling-lived
>         state is
>         retained, your data is mostly short-lived and you'll be able to
>         do without
>         any promotion at all. Your problem here is that somehow your
>         survivor
>         spaces may have disappeared. (+PrintHeapAtGC will tell you, and
>         of course examining yr JVM options should shed more light on
>         that apparent
>         disappearance.)
> 
>         -- ramki
> 
>         On 07/21/10 10:09, Eagle Kessler wrote:
> 
>             Hello all,
>              I've got a web service that I'm seeing some very strange
>             behavior on around it's garbage collection, and was
>             wondering if someone here could explain why it might be
>             happening. The service itself is fairly simple: Take in data
>             from a few sources and merge them with the existing data in
>             the database. It stores nearly no state while doing this,
>             and indeed heap dumps taken 1, 24, and 72 hours after boot
>             indicate that we have a consistent ~12mb of live data (in a
>             2GB heap, but I hope that's not what is causing this).
> 
>             The GC logs, though, don't look very happy at all. After our
>             start up period, they settle into a fairly consistent pattern:
> 
>             1041.159: [GC 1041.159: [ParNew
>             Desired survivor size 2064384 bytes, new threshold 0 (max 0)
> 
>             : 516224K->0K(520256K), 0.0170322 secs]
>             537266K->22659K(2093120K), 0.0171330 secs] [Times: user=0.04
>             sys=0.01, real=0.02 secs] 1606.500: [GC 1606.500: [ParNew
>             Desired survivor size 2064384 bytes, new threshold 0 (max 0)
> 
>             : 516224K->0K(520256K), 0.0173235 secs]
>             538883K->24214K(2093120K), 0.0174138 secs] [Times: user=0.04
>             sys=0.03, real=0.02 secs] 2040.773: [GC 2040.773: [ParNew
>             Desired survivor size 2064384 bytes, new threshold 0 (max 0)
> 
>             : 516224K->0K(520256K), 0.0196319 secs]
>             540438K->25737K(2093120K), 0.0197275 secs] [Times: user=0.05
>             sys=0.02, real=0.02 secs]
>             Which we would be very nice if it kept going like that.
>             However, by the first time the CMS collector runs, things
>             aren't working nearly as well:
> 
>             214182.439: [GC 214182.439: [ParNew
>             Desired survivor size 2064384 bytes, new threshold 0 (max 0)
>             : 516224K->0K(520256K), 1.0146996 secs]
>             1297278K->782575K(2093120K), 1.0148799 secs] [Times:
>             user=1.21 sys=0.58, real=1.01 secs]
>             214247.437: [GC 214247.438: [ParNew
>             Desired survivor size 2064384 bytes, new threshold 0 (max 0)
>             : 516224K->0K(520256K), 1.2310274 secs]
>             1298799K->784188K(2093120K), 1.2311534 secs] [Times:
>             user=1.46 sys=0.69, real=1.23 secs]
>             214313.642: [GC 214313.642: [ParNew
>             Desired survivor size 2064384 bytes, new threshold 0 (max 0)
>             : 516224K->0K(520256K), 1.2183258 secs]
>             1300412K->785710K(2093120K), 1.2184848 secs] [Times:
>             user=1.45 sys=0.65, real=1.22 secs]
> 
>             The increasing sys time is a bit worrying, but it seems like
>             the actual GC time is rising as well, even though we aren't
>             collecting any more young-gen garbage. At this point, CMS
>             went off
> 
>             214380.695: [GC [1 CMS-initial-mark: 787188K(1572864K)]
>             787195K(2093120K), 1.8929842 secs] [Times: user=1.50
>             sys=0.02, real=1.89 secs]
>             214382.589: [CMS-concurrent-mark-start]
>             214383.056: [CMS-concurrent-mark: 0.467/0.467 secs] [Times:
>             user=1.81 sys=0.01, real=0.47 secs] 214383.056:
>             [CMS-concurrent-preclean-start]
>             214383.064: [CMS-concurrent-preclean: 0.008/0.008 secs]
>             [Times: user=0.01 sys=0.00, real=0.01 secs]
>             214383.064: [CMS-concurrent-abortable-preclean-start]
>              CMS: abort preclean due to time 214388.133:
>             [CMS-concurrent-abortable-preclean: 0.242/5.069 secs]
>             [Times: user=5.02 sys=0.02, real=5.07 secs] 214388.159:
>             [GC[YG occupancy: 51943 K (520256 K)]214388.159: [Rescan
>             (parallel) , 1.5403455 secs]214389.699: [weak refs
>             processing, 0.0050170 secs] [1 CMS-remark:
>             787188K(1572864K)] 839132K(2093120K), 1.5458536 secs]
>             [Times: user=1.80 sys=0.71, real=1.55 secs]
>             214389.705: [CMS-concurrent-sweep-start]
>             214390.439: [CMS-concurrent-sweep: 0.671/0.734 secs] [Times:
>             user=1.35 sys=0.00, real=0.73 secs] 214390.439:
>             [CMS-concurrent-reset-start]
>             214390.621: [CMS-concurrent-reset: 0.183/0.183 secs] [Times:
>             user=0.20 sys=0.02, real=0.18 secs]
> 
>             It seems like a) initial-mark shouldn't take 1.8 seconds, b)
>             if we really do only have 12mb of live data, CMS should have
>             collected a lot more than it did (the next ParNew collection
>             reported ~545MB of old gen in use), and c) 50% heap usage
>             with very little promotion seems very early for the
>             collector to go off.
> 
>             The next CMS cycle is at 434,973 seconds, by which point the
>             young gen collections are taking 3 seconds (user 3.59, sys
>             1.60, real 3.09). The initial mark takes 4.82 seconds (user
>             3.82, sys 0.02, real 4.82), and sweeps down to 1.1gb of used
>             old gen. I haven't yet confirmed it, but given the previous
>             heap dumps I'd guess that they will claim 12mb of live
>             objects and 1.1gb of dead objects. The current young gen
>             collections (at 497,601 seconds) are taking ~3.7 seconds
>             (4.33 user, 2.03 sys) Any idea what could be going on here?
>             We're running JDK 1.6_16.
> 
>             -- 
>             Eagle Kessler
> 
> 
>             ------------------------------------------------------------------------
> 
>             _______________________________________________
>             hotspot-gc-use mailing list
>             hotspot-gc-use at openjdk.java.net
>             <mailto:hotspot-gc-use at openjdk.java.net>
>             http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> 
>         _______________________________________________
>         hotspot-gc-use mailing list
>         hotspot-gc-use at openjdk.java.net
>         <mailto:hotspot-gc-use at openjdk.java.net>
>         http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> 
> 
> 
> 
> -- 
> Eagle Kessler
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


More information about the hotspot-gc-dev mailing list