<p dir="ltr">Hi,</p>

<p dir="ltr">Yes the GC-logging is synchronous and part of the GC cycle, so any delay will add to the pause.<br><br></p>

<p dir="ltr">Best regards, <br>

Gustav Åkesson</p>

<div class="gmail_quote">Den 9 apr 2014 18:40 skrev "Cornelius Riemenschneider" <<a href="mailto:cri@itscope.de">cri@itscope.de</a>>:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Bingo!<br>

First, i tried to get a tool which shows me which process writes to which file and how long that takes, but I was unable to find one I could master to use.<br>

Based on your suggestion I moved the log.gc file to a ramdisk and performed extensive load testing - now my biggest outlier is<br>

2014-04-09T18:29:13.873+0200: 383.372: [GC2014-04-09T18:29:13.874+0200: 383.372: [ParNew: 431041K->208698K(3145728K), 1.8312460 secs] 11055599K->11203421K(21495808K), 1.8315130 secs] [Times: user=2.61 sys=0.03, real=1.83 secs]<br>


which is okay.<br>

When moving log.gc back to the harddisk to which redis saves I get pauses as long as 45sec.<br>

<br>

So I agree with Vitaly that redis probably thrashes some caches (though not all, as it only occupies one cpu core of a dual-socket server) which would explain a slowdown of garbage collections, but the real cause of the long spikes is actually writing the log to disk.<br>


<br>

Could you guys check if you write the gc log synchronous?<br>

If yes, I'd suggest (a) moving writing the log out of the STW phase and (b) using asynchronous I/O for writing the log.<br>

<br>

Thanks for all your suggestions and your help!<br>

<br>

Regards,<br>

Cornelius Riemenschneider<br>

--<br>

ITscope GmbH<br>

Ludwig-Erhard-Allee 20<br>

76131 Karlsruhe<br>

Email: <a href="mailto:cornelius.riemenschneider@itscope.de">cornelius.riemenschneider@itscope.de</a><br>

<a href="https://www.itscope.com" target="_blank">https://www.itscope.com</a><br>

Handelsregister: AG Mannheim, HRB 232782<br>

Sitz der Gesellschaft: Karlsruhe<br>

Geschäftsführer: Alexander Münkel, Benjamin Mund, Stefan Reger<br>

<br>

<br>

-----Ursprüngliche Nachricht-----<br>

Von: Wolfgang Pedot [mailto:<a href="mailto:wolfgang.pedot@finkzeit.at">wolfgang.pedot@finkzeit.at</a>]<br>

Gesendet: Mittwoch, 9. April 2014 17:29<br>

An: Vitaly Davidovich<br>

Cc: Cornelius Riemenschneider; hotspot-gc-use<br>

Betreff: Re: ridiculous ParNew pause times<br>

<br>

A crazy idea: could the gc-logging be the problem here because the redis-dump is using up all the IO bandwidth?<br>

<br>

Wolfgang<br>

<br>

Am 09.04.2014 17:06, schrieb Vitaly Davidovich:<br>

> Interesting.  How much RAM does this machine have? You're using about<br>

> 21-22g for jvm and another 5g for redis, right? Is there room to spare<br>

> in physical memory?<br>

><br>

> If there's no swap activity, then I don't think there's any I/O<br>

> interference.  I'm guessing that redis writing out 5g of memory is<br>

> blowing the cpu caches for the jvm, and causing page faults.  Although<br>

> if there's no swap activity, it seems these faults would be soft and<br>

> it's hard to explain how that would balloon to 30 sec pauses and not<br>

> show up in sys time then.<br>

><br>

> Are there lots of context switches in the java process when this happens?<br>

><br>

> Sent from my phone<br>

><br>

> On Apr 9, 2014 10:41 AM, "Cornelius Riemenschneider" <<a href="mailto:cri@itscope.de">cri@itscope.de</a><br>

> <mailto:<a href="mailto:cri@itscope.de">cri@itscope.de</a>>> wrote:<br>

><br>

>     Hey,____<br>

><br>

>     thanks for the hints :-)____<br>

><br>

>     The server runs one JVM and one redis instance (A event-based,<br>

>     single threaded in-memory nosql-datastore).____<br>

><br>

>     Redis stores about 5G of data, which are written to disk from time<br>

>     to time ? it now turns out, that the redis saves align perfectly<br>

>     with our long ParNew times.____<br>

><br>

>     By initiating a redis save I was able to get these garbage<br>

>     collections.____<br>

><br>

>     2014-04-09T15:06:07.892+0200: 85043.119:<br>

>     [GC2014-04-09T15:06:18.089+0200: 85053.315: [ParNew:<br>

>     2964426K->338263K(3145728K), 0.1532950 secs]<br>

>     7167668K->4541505K(21495808K), 10.3502410 secs] [Times: user=1.93<br>

>     sys=0.01, real=10.35 secs]____<br>

><br>

>     2014-04-09T15:06:39.203+0200: 85074.429:<br>

>     [GC2014-04-09T15:07:11.026+0200: 85106.252: [ParNew:<br>

>     3145728K->405516K(3145728K), 0.3429470 secs]<br>

>     7645283K->5212053K(21495808K), 32.1663210 secs] [Times: user=2.96<br>

>     sys=0.01, real=32.17 secs]____<br>

><br>

>     I monitored the system during the GCs and swapping definitly does<br>

>     not happen.____<br>

><br>

>     Because redis is single-threaded (and only used by the JVM which is<br>

>     during ParNew in a STW phase), during ParNew there is only redis<br>

>     active, trying to write 5G of memory to disk as fast as possible and<br>

>     the GC.____<br>

><br>

>     I wasn?t able to pinpoint the issue yet, do you have an idea why the<br>

>     jvm could block on I/O during a GC?____<br>

><br>

>     Disk access is during these phases probably way slower than usual,<br>

>     as everything is on the / partition, a RAID 1 disk array.____<br>

><br>

>     Or does anyone know the right perf options for  the perf tool to<br>

>     monitor blocking i/o during garbage collections?____<br>

><br>

>     __ __<br>

><br>

>     Regards,____<br>

><br>

>     Cornelius Riemenschneider____<br>

><br>

>     --____<br>

><br>

>     ITscope GmbH____<br>

><br>

>     Ludwig-Erhard-Allee 20____<br>

><br>

>     76131 Karlsruhe____<br>

><br>

>     Email: <a href="mailto:cornelius.riemenschneider@itscope.de">cornelius.riemenschneider@itscope.de</a><br>

>     <mailto:<a href="mailto:cornelius.riemenschneider@itscope.de">cornelius.riemenschneider@itscope.de</a>>____<br>

><br>

>     <a href="https://www.itscope.com____" target="_blank">https://www.itscope.com____</a><br>

><br>

>     Handelsregister: AG Mannheim, HRB 232782____<br>

><br>

>     Sitz der Gesellschaft: Karlsruhe____<br>

><br>

>     Geschäftsführer: Alexander Münkel, Benjamin Mund, Stefan Reger____<br>

><br>

>     __ __<br>

><br>

>     *Von:*Vitaly Davidovich [mailto:<a href="mailto:vitalyd@gmail.com">vitalyd@gmail.com</a><br>

>     <mailto:<a href="mailto:vitalyd@gmail.com">vitalyd@gmail.com</a>>]<br>

>     *Gesendet:* Mittwoch, 9. April 2014 14:49<br>

>     *An:* Cornelius Riemenschneider<br>

>     *Cc:* hotspot-gc-use<br>

>     *Betreff:* Re: ridiculous ParNew pause times____<br>

><br>

>     __ __<br>

><br>

>     This is typically caused by one of:____<br>

><br>

>     1) heavy swapping (this won't count towards sys time because kernel<br>

>     is not using cpu while waiting for I/O to complete)<br>

>     2) oversubscribed machine where gc threads aren't getting enough cpu<br>

>     time due____<br>

><br>

>     Have you looked at stats on the machine when these pauses occur,<br>

>     specifically around swap activity? Is your machine running multiple<br>

>     JVMs or any other noisy neighbor apps?____<br>

><br>

>     Sent from my phone____<br>

><br>

>     On Apr 9, 2014 7:26 AM, "Cornelius Riemenschneider" <<a href="mailto:cri@itscope.de">cri@itscope.de</a><br>

>     <mailto:<a href="mailto:cri@itscope.de">cri@itscope.de</a>>> wrote:____<br>

><br>

>     Hello,____<br>

><br>

>     we?ve got the following problem with the ParNew collector:____<br>

><br>

>     Our log.gc usually looks like this:____<br>

><br>

>     2014-04-09T12:58:02.485+0200: 77357.712:<br>

>     [GC2014-04-09T12:58:02.485+0200: 77357.712: [ParNew:<br>

>     2722925K->100795K(3145728K), 0.3202010 secs]<br>

>     6998057K->4375934K(21495808K), 0.3205670 secs] [Times: user=4.10<br>

>     sys=0.02, real=0.32 secs] ____<br>

><br>

>     2014-04-09T12:58:06.256+0200: 77361.483:<br>

>     [GC2014-04-09T12:58:06.257+0200: 77361.483: [ParNew:<br>

>     2722235K->101011K(3145728K), 0.3229910 secs]<br>

>     6997374K->4376165K(21495808K), 0.3233580 secs] [Times: user=4.13<br>

>     sys=0.02, real=0.33 secs] ____<br>

><br>

>     2014-04-09T12:58:12.295+0200: 77367.522:<br>

>     [GC2014-04-09T12:58:12.296+0200: 77367.522: [ParNew:<br>

>     2722451K->101057K(3145728K), 0.3215320 secs]<br>

>     6997605K->4376216K(21495808K), 0.3219080 secs] [Times: user=4.12<br>

>     sys=0.01, real=0.32 secs] ____<br>

><br>

>     2014-04-09T12:58:18.461+0200: 77373.688:<br>

>     [GC2014-04-09T12:58:18.462+0200: 77373.688: [ParNew:<br>

>     2722497K->2232K(3145728K), 0.2944540 secs]<br>

>     6997656K->4376242K(21495808K), 0.2948280 secs] [Times: user=3.79<br>

>     sys=0.00, real=0.29 secs]____<br>

><br>

>     But occasionally we have entries like these:____<br>

><br>

>     2014-04-09T09:56:12.840+0200: 66448.066:<br>

>     [GC2014-04-09T09:56:38.154+0200: 66473.381: [ParNew:<br>

>     3139808K->524288K(3145728K), 0.8355990 secs]<br>

>     6845512K->4585563K(21495808K), 26.1502640 secs] [Times: user=5.59<br>

>     sys=0.16, real=26.16 secs] ____<br>

><br>

>     2014-04-09T09:57:09.173+0200: 66504.400:<br>

>     [GC2014-04-09T09:57:24.871+0200: 66520.098: [ParNew:<br>

>     2950573K->488555K(3145728K), 0.1876660 secs]<br>

>     8701082K->6239064K(21495808K), 15.8858250 secs] [Times: user=2.38<br>

>     sys=0.00, real=15.89 secs] ____<br>

><br>

>     2014-04-09T12:58:34.661+0200: 77389.888:<br>

>     [GC2014-04-09T12:59:06.390+0200: 77421.616: [ParNew:<br>

>     2623439K->2083K(3145728K), 0.0292270 secs]<br>

>     6997709K->4376490K(21495808K), 31.7578950 secs] [Times: user=0.34<br>

>     sys=0.02, real=31.76 secs] ____<br>

><br>

>     which I can?t explain at all.____<br>

><br>

>     The real time of 31.76sec equals a pause of 31.76secs, in which the<br>

>     jvm does not respond to user requests, which is obviously bad.____<br>

><br>

>     The application is _/very/_ allocation heavy, so generally pauses of<br>

>     0.3sec are okay.____<br>

><br>

>     Our GC settings for this server are:____<br>

><br>

>     -Xms21g____<br>

><br>

>     -Xmx21g____<br>

><br>

>     -XX:ReservedCodeCacheSize=256m____<br>

><br>

>     -XX:PermSize=256m____<br>

><br>

>     -XX:MaxPermSize=768m____<br>

><br>

>     -server____<br>

><br>

>     -verbose:gc____<br>

><br>

>     -Xloggc:log.gc____<br>

><br>

>     -XX:+PrintGCDetails____<br>

><br>

>     -XX:+PrintGCDateStamps____<br>

><br>

>     -XX:+ExplicitGCInvokesConcurrent____<br>

><br>

>     -XX:NewRatio=5____<br>

><br>

>     -XX:SurvivorRatio=5____<br>

><br>

>     -XX:+UseConcMarkSweepGC____<br>

><br>

>     -XX:+UseParNewGC____<br>

><br>

>     -XX:+UseCMSInitiatingOccupancyOnly____<br>

><br>

>     -XX:CMSInitiatingOccupancyFraction=40____<br>

><br>

>     -XX:+CMSClassUnloadingEnabled____<br>

><br>

>     -XX:+CMSScavengeBeforeRemark____<br>

><br>

>     -Dsun.rmi.dgc.client.gcInterval=1209600000____<br>

><br>

>     -Dsun.rmi.dgc.server.gcInterval=1209600000____<br>

><br>

>     ____<br>

><br>

>     We run the sun jdk 7u51 on a current debian wheezy.____<br>

><br>

>     We previously had issues with long ParNew pauses, but back then, the<br>

>     sys time was high, so we concluded that the server was<br>

> swapping,____<br>

><br>

>     which we were able to prevent.____<br>

><br>

>     Do you have any idea or further hint at debugging options which<br>

>     might help us in finding the issue?____<br>

><br>

>     ____<br>

><br>

>     Regards,____<br>

><br>

>     Cornelius Riemenschneider____<br>

><br>

>     --____<br>

><br>

>     ITscope GmbH____<br>

><br>

>     Ludwig-Erhard-Allee 20____<br>

><br>

>     76131 Karlsruhe____<br>

><br>

>     Email: <a href="mailto:cornelius.riemenschneider@itscope.de">cornelius.riemenschneider@itscope.de</a><br>

>     <mailto:<a href="mailto:cornelius.riemenschneider@itscope.de">cornelius.riemenschneider@itscope.de</a>>____<br>

><br>

>     <a href="https://www.itscope.com____" target="_blank">https://www.itscope.com____</a><br>

><br>

>     Handelsregister: AG Mannheim, HRB 232782____<br>

><br>

>     Sitz der Gesellschaft: Karlsruhe____<br>

><br>

>     Geschäftsführer: Alexander Münkel, Benjamin Mund, Stefan Reger____<br>

><br>

>     ____<br>

><br>

><br>

>     _______________________________________________<br>

>     hotspot-gc-use mailing list<br>

>     <a href="mailto:hotspot-gc-use@openjdk.java.net">hotspot-gc-use@openjdk.java.net</a> <mailto:<a href="mailto:hotspot-gc-use@openjdk.java.net">hotspot-gc-use@openjdk.java.net</a>><br>

>     <a href="http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use____" target="_blank">http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use____</a><br>

><br>

><br>

><br>

> _______________________________________________<br>

> hotspot-gc-use mailing list<br>

> <a href="mailto:hotspot-gc-use@openjdk.java.net">hotspot-gc-use@openjdk.java.net</a><br>

> <a href="http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use" target="_blank">http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use</a><br>

><br>

<br>

<br>

_______________________________________________<br>

hotspot-gc-use mailing list<br>

<a href="mailto:hotspot-gc-use@openjdk.java.net">hotspot-gc-use@openjdk.java.net</a><br>

<a href="http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use" target="_blank">http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use</a><br>

</blockquote></div>