All~<div><br></div><div>Attached is a very small reproduction script. From the included RE<br><br><div>1) set JAVA_HOME</div><div>2) build with `make`</div><div>3) run with</div><div><br></div><div>java -verbose:gc -XX:+PrintCommandLineFlags -XX:+PrintGCDetails -XX:+PrintGCTaskTimeStamps -XX:+PrintGCTimeStamps -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Xms1g -Xmx1g -Xmn10m -Djava.library.path=lib -jar tester.jar </div>
<div><br></div><div>4) notice the growing GC times</div></div><div><br></div><div><br></div><div>From having experimented with this, the key is the jvm->AttachCurrentThread()/jvm->DetachCurrentThread() calls when combined with the env->GetObjectClass() and env->GetMethodID() calls.</div>
<div><br></div><div>If you attach and detatch within the loop, the GC times do not grow without bound. If you cache the results of GetMethodID outside the loop, the results do not grow without bound.</div><div><br></div>
<div>Looking at src/share/vm/runtime/handles.cpp:76, you see the comment</div><div><br></div><div>// The thread local handle areas should not get very large</div><div><br></div><div>However, src/share/vm/prims/jni.cpp:1229 allocates a KlassHandle in each call to GetMethodID, which seems to leak as it is never freed from the HandleArea of the thread...</div>
<div><br></div><div><br></div><div>Obviously, I can fix my code to cache its jmethodID's and/or release threads more often (which I have done and it seems to fix my problem). But this seems a surprising side effect of GetMethodID()...</div>
<div><br></div><div>Matt</div><div><br><div class="gmail_quote">On Thu, Jun 3, 2010 at 5:40 PM, Matt Fowles <span dir="ltr"><<a href="mailto:matt.fowles@gmail.com">matt.fowles@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Eric~<div><br></div><div>That is my suspicion as well. It would be nice if there were a flag or something to print stats for JNI handles...</div><div><br></div><div><font color="#888888">Matt</font><div><div></div><div class="h5">
<br><br><div class="gmail_quote">On Thu, Jun 3, 2010 at 4:57 PM, Eric Caspole <span dir="ltr"><<a href="mailto:eric.caspole@amd.com" target="_blank">eric.caspole@amd.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Matt,<br>
I had a problem like this in a previous job. It turned out to be leaking jni handles. The jni handles get scanned during the GC even if they are stale/useless. So I would carefully inspect your JNI code for incorrect use of jni handles.<br>
<br>
I debugged this problem by doing before/after oprofiles and comparing the runs to see what was taking more time in gc, if that is any help to you.<br>
Regards,<br>
Eric<div><div></div><div><br>
<br>
<br>
On Jun 3, 2010, at 1:17 PM, Matt Fowles wrote:<br>
<br>
</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div></div><div>
All~<br>
<br>
Today we were able to isolate the piece of our system that is causing the increase in GC times. We can now reproduce on multiple machines in a much faster manner (about 20 minutes). The attached log represents a run from just this component. There is a full GC part way through the run that we forced using visual vm. The following interesting facts are presented for your amusement/edification:<br>
<br>
1) the young gen pause times increase steadily over the course of the run<br>
2) the full GC doesn't effect the young gen pause times<br>
3) the component in question makes heavy use of JNI<br>
<br>
I suspect the 3rd fact is the most interesting one. We will [obviously] be looking into reducing the test case further and possibly fixing our JNI code. But this represents a huge leap forward in our understanding of the problem.<br>
<br>
Matt<br>
<br>
<br>
On Fri, May 14, 2010 at 3:32 PM, Y. Srinivas Ramakrishna <<a href="mailto:y.s.ramakrishna@oracle.com" target="_blank">y.s.ramakrishna@oracle.com</a>> wrote:<br>
Matt -- Yes, comparative data for all these for 6u20 and jdk 7<br>
would be great. Naturally, server 1 is most immediately useful<br>
for determining if 6631166 addresses this at all,<br>
but others would be useful too if it turns out it doesn't<br>
(i.e. if jdk 7's server 1 turns out to be no better than 6u20's --<br>
at which point we should get this into the right channel -- open a bug,<br>
and a support case).<br>
<br>
thanks.<br>
-- ramki<br>
<br>
<br>
On 05/14/10 12:24, Matt Fowles wrote:<br>
Ramki~<br>
<br>
I am preparing the flags for the next 3 runs (which run in parallel) and wanted to check a few things with you. I believe that each of these is collecting a useful data point,<br>
<br>
Server 1 is running with 8 threads, reduced young gen, and MTT 1.<br>
Server 2 is running with 8 threads, reduced young gen, and MTT 1, ParNew, but NOT CMS.<br>
Server 3 is running with 8 threads, reduced young gen, and MTT 1, and PrintFLSStatistics.<br>
<br>
I can (additionally) run all of these tests on JDK7 (Java HotSpot(TM) 64-Bit Server VM (build 17.0-b05, mixed mode)).<br>
<br>
Server 1:<br>
-verbose:gc<br>
-XX:+PrintGCTimeStamps<br>
-XX:+PrintGCDetails<br>
-XX:+PrintGCTaskTimeStamps<br>
-XX:+PrintCommandLineFlags<br>
<br>
-Xms32g -Xmx32g -Xmn1g<br>
-XX:+UseParNewGC<br>
-XX:ParallelGCThreads=8<br>
-XX:+UseConcMarkSweepGC<br>
-XX:ParallelCMSThreads=8<br>
-XX:MaxTenuringThreshold=1<br>
-XX:SurvivorRatio=14<br>
-XX:+CMSParallelRemarkEnabled<br>
-Xloggc:gc1.log<br>
-XX:+UseLargePages -XX:+AlwaysPreTouch<br>
<br>
Server 2:<br>
-verbose:gc<br>
-XX:+PrintGCTimeStamps<br>
-XX:+PrintGCDetails<br>
-XX:+PrintGCTaskTimeStamps<br>
-XX:+PrintCommandLineFlags<br>
<br>
-Xms32g -Xmx32g -Xmn1g<br>
-XX:+UseParNewGC<br>
-XX:ParallelGCThreads=8<br>
-XX:MaxTenuringThreshold=1<br>
-XX:SurvivorRatio=14<br>
-Xloggc:gc2.log<br>
-XX:+UseLargePages -XX:+AlwaysPreTouch<br>
<br>
<br>
Server 3:<br>
-verbose:gc<br>
-XX:+PrintGCTimeStamps<br>
-XX:+PrintGCDetails<br>
-XX:+PrintGCTaskTimeStamps<br>
-XX:+PrintCommandLineFlags<br>
<br>
-Xms32g -Xmx32g -Xmn1g<br>
-XX:+UseParNewGC<br>
-XX:ParallelGCThreads=8<br>
-XX:+UseConcMarkSweepGC<br>
-XX:ParallelCMSThreads=8<br>
-XX:MaxTenuringThreshold=1<br>
-XX:SurvivorRatio=14<br>
-XX:+CMSParallelRemarkEnabled<br>
-Xloggc:gc3.log<br>
<br>
-XX:PrintFLSStatistics=2<br>
-XX:+UseLargePages<br>
-XX:+AlwaysPreTouch<br>
Matt<br>
<br>
On Fri, May 14, 2010 at 1:44 PM, Y. Srinivas Ramakrishna <<a href="mailto:y.s.ramakrishna@oracle.com" target="_blank">y.s.ramakrishna@oracle.com</a> <mailto:<a href="mailto:y.s.ramakrishna@oracle.com" target="_blank">y.s.ramakrishna@oracle.com</a>>> wrote:<br>
> On 05/14/10 10:36, Y. Srinivas Ramakrishna wrote:<br>
>><br>
>> On 05/14/10 10:24, Matt Fowles wrote:<br>
>>><br>
>>> Jon~<br>
>>><br>
>>> That makes, sense but the fact is that the old gen *never* get<br>
>>> collected. So all the allocations happen from the giant empty space<br>
>>> at the end of the free list. I thought fragmentation only occurred<br>
>>> when the free lists are added to after freeing memory...<br>
>><br>
>> As Jon indicated allocation is done from free lists of blocks<br>
>> that are pre-carved on demand to avoid contention while allocating.<br>
>> The old heuristics for how large to make those lists and the<br>
>> inventory to hold in those lists was not working well as you<br>
>> scaled the number of workers. Following 6631166 we believe it<br>
>> works better and causes both less contention and less<br>
>> fragmentation than it did before, because we do not hold<br>
>> unnecessary excess inventory of free blocks.<br>
><br>
> To see what the fragmentation is, try -XX:PrintFLSStatistics=2.<br>
> This will slow down your scavenge pauses (perhaps by quite a bit<br>
> for your 26 GB heap), but you will get a report of the number of<br>
> blocks on free lists and how fragmented the space is on that ccount<br>
> (for some appropriate notion of fragmentation). Don't use that<br>
> flag in production though :-)<br>
><br>
> -- ramki<br>
><br>
>><br>
>> The fragmentation in turn causes card-scanning to suffer<br>
>> adversely, besides the issues with loss of spatial locality also<br>
>> increasing cache misses and TLB misses. (The large page<br>
>> option might help mitigate the latter a bit, especially<br>
>> since you have such a large heap and our fragmented<br>
>> allocation may be exacerbating the TLB pressure.)<br>
>><br>
>> -- ramki<br>
>><br>
>>> Matt<br>
>>><br>
>>> On Thu, May 13, 2010 at 6:29 PM, Jon Masamitsu <<a href="mailto:jon.masamitsu@oracle.com" target="_blank">jon.masamitsu@oracle.com</a> <mailto:<a href="mailto:jon.masamitsu@oracle.com" target="_blank">jon.masamitsu@oracle.com</a>>><br>
<br>
>>> wrote:<br>
>>>><br>
>>>> Matt,<br>
>>>><br>
>>>> To amplify on Ramki's comment, the allocations out of the<br>
>>>> old generation are always from a free list. During a young<br>
>>>> generation collection each GC thread will get its own<br>
>>>> local free lists from the old generation so that it can<br>
>>>> copy objects to the old generation without synchronizing<br>
>>>> with the other GC thread (most of the time). Objects from<br>
>>>> a GC thread's local free lists are pushed to the globals lists<br>
>>>> after the collection (as far as I recall). So there is some<br>
>>>> churn in the free lists.<br>
>>>><br>
>>>> Jon<br>
>>>><br>
>>>> On 05/13/10 14:52, Y. Srinivas Ramakrishna wrote:<br>
>>>>><br>
>>>>> On 05/13/10 10:50, Matt Fowles wrote:<br>
>>>>>><br>
>>>>>> Jon~<br>
>>>>>><br>
>>>>>> This may sound naive, but how can fragmentation be an issue if the old<br>
>>>>>> gen has never been collected? I would think we are still in the space<br>
>>>>>> where we can just bump the old gen alloc pointer...<br>
>>>>><br>
>>>>> Matt, The old gen allocator may fragment the space. Allocation is not<br>
>>>>> exactly "bump a pointer".<br>
>>>>><br>
>>>>> -- ramki<br>
>>>>><br>
>>>>>> Matt<br>
>>>>>><br>
>>>>>> On Thu, May 13, 2010 at 12:23 PM, Jon Masamitsu<br>
>>>>>> <<a href="mailto:jon.masamitsu@oracle.com" target="_blank">jon.masamitsu@oracle.com</a> <mailto:<a href="mailto:jon.masamitsu@oracle.com" target="_blank">jon.masamitsu@oracle.com</a>>> wrote:<br>
>>>>>>><br>
>>>>>>> Matt,<br>
>>>>>>><br>
>>>>>>> As Ramki indicated fragmentation might be an issue. As the<br>
>>>>>>> fragmentation<br>
>>>>>>> in the old generation increases, it takes longer to find space in the<br>
>>>>>>> old<br>
>>>>>>> generation<br>
>>>>>>> into which to promote objects from the young generation. This is<br>
>>>>>>> apparently<br>
>>>>>>> not<br>
>>>>>>> the problem that Wayne is having but you still might be hitting it.<br>
>>>>>>> If<br>
>>>>>>> you<br>
>>>>>>> can<br>
>>>>>>> connect jconsole to the VM and force a full GC, that would tell us if<br>
>>>>>>> it's<br>
>>>>>>> fragmentation.<br>
>>>>>>><br>
>>>>>>> There might be a scaling issue with the UseParNewGC. If you can use<br>
>>>>>>> -XX:-UseParNewGC (turning off the parallel young<br>
>>>>>>> generation collection) with -XX:+UseConcMarkSweepGC the pauses<br>
>>>>>>> will be longer but may be more stable. That's not the solution but<br>
>>>>>>> just<br>
>>>>>>> part<br>
>>>>>>> of the investigation.<br>
>>>>>>><br>
>>>>>>> You could try just -XX:+UseParNewGC without -XX:+UseConcMarkSweepGC<br>
>>>>>>> and if you don't see the growing young generation pause, that would<br>
>>>>>>> indicate<br>
>>>>>>> something specific about promotion into the CMS generation.<br>
>>>>>>><br>
>>>>>>> UseParallelGC is different from UseParNewGC in a number of ways<br>
>>>>>>> and if you try UseParallelGC and still see the growing young<br>
>>>>>>> generation<br>
>>>>>>> pauses, I'd suspect something special about your application.<br>
>>>>>>><br>
>>>>>>> If you can run these experiments hopefully they will tell<br>
>>>>>>> us where to look next.<br>
>>>>>>><br>
>>>>>>> Jon<br>
>>>>>>><br>
>>>>>>><br>
>>>>>>> On 05/12/10 15:19, Matt Fowles wrote:<br>
>>>>>>><br>
>>>>>>> All~<br>
>>>>>>><br>
>>>>>>> I have a large app that produces ~4g of garbage every 30 seconds and<br>
>>>>>>> am trying to reduce the size of gc outliers. About 99% of this data<br>
>>>>>>> is garbage, but almost anything that survives one collection survives<br>
>>>>>>> for an indeterminately long amount of time. We are currently using<br>
>>>>>>> the following VM and options:<br>
>>>>>>><br>
>>>>>>> java version "1.6.0_20"<br>
>>>>>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)<br>
>>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)<br>
>>>>>>><br>
>>>>>>> -verbose:gc<br>
>>>>>>> -XX:+PrintGCTimeStamps<br>
>>>>>>> -XX:+PrintGCDetails<br>
>>>>>>> -XX:+PrintGCTaskTimeStamps<br>
>>>>>>> -XX:+PrintTenuringDistribution<br>
>>>>>>> -XX:+PrintCommandLineFlags<br>
>>>>>>> -XX:+PrintReferenceGC<br>
>>>>>>> -Xms32g -Xmx32g -Xmn4g<br>
>>>>>>> -XX:+UseParNewGC<br>
>>>>>>> -XX:ParallelGCThreads=4<br>
>>>>>>> -XX:+UseConcMarkSweepGC<br>
>>>>>>> -XX:ParallelCMSThreads=4<br>
>>>>>>> -XX:CMSInitiatingOccupancyFraction=60<br>
>>>>>>> -XX:+UseCMSInitiatingOccupancyOnly<br>
>>>>>>> -XX:+CMSParallelRemarkEnabled<br>
>>>>>>> -XX:MaxGCPauseMillis=50<br>
>>>>>>> -Xloggc:gc.log<br>
>>>>>>><br>
>>>>>>><br>
>>>>>>> As you can see from the GC log, we never actually reach the point<br>
>>>>>>> where the CMS kicks in (after app startup). But our young gens seem<br>
>>>>>>> to take increasingly long to collect as time goes by.<br>
>>>>>>><br>
>>>>>>> The steady state of the app is reached around 956.392 into the log<br>
>>>>>>> with a collection that takes 0.106 seconds. Thereafter the survivor<br>
>>>>>>> space remains roughly constantly as filled and the amount promoted to<br>
>>>>>>> old gen also remains constant, but the collection times increase to<br>
>>>>>>> 2.855 seconds by the end of the 3.5 hour run.<br>
>>>>>>><br>
>>>>>>> Has anyone seen this sort of behavior before? Are there more<br>
>>>>>>> switches<br>
>>>>>>> that I should try running with?<br>
>>>>>>><br>
>>>>>>> Obviously, I am working to profile the app and reduce the garbage<br>
>>>>>>> load<br>
>>>>>>> in parallel. But if I still see this sort of problem, it is only a<br>
>>>>>>> question of how long must the app run before I see unacceptable<br>
>>>>>>> latency spikes.<br>
>>>>>>><br>
>>>>>>> Matt<br>
>>>>>>><br>
>>>>>>> ________________________________<br>
>>>>>>> _______________________________________________<br>
>>>>>>> hotspot-gc-use mailing list<br>
>>>>>>> <a href="mailto:hotspot-gc-use@openjdk.java.net" target="_blank">hotspot-gc-use@openjdk.java.net</a> <mailto:<a href="mailto:hotspot-gc-use@openjdk.java.net" target="_blank">hotspot-gc-use@openjdk.java.net</a>><br>
<br>
>>>>>>> <a href="http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use" target="_blank">http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use</a><br>
>>>>>><br>
>>>>>> _______________________________________________<br>
>>>>>> hotspot-gc-use mailing list<br>
>>>>>> <a href="mailto:hotspot-gc-use@openjdk.java.net" target="_blank">hotspot-gc-use@openjdk.java.net</a> <mailto:<a href="mailto:hotspot-gc-use@openjdk.java.net" target="_blank">hotspot-gc-use@openjdk.java.net</a>><br>
<br>
>>>>>> <a href="http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use" target="_blank">http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use</a><br>
>><br>
>> _______________________________________________<br>
>> hotspot-gc-use mailing list<br>
>> <a href="mailto:hotspot-gc-use@openjdk.java.net" target="_blank">hotspot-gc-use@openjdk.java.net</a> <mailto:<a href="mailto:hotspot-gc-use@openjdk.java.net" target="_blank">hotspot-gc-use@openjdk.java.net</a>><br>
<br>
>> <a href="http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use" target="_blank">http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use</a><br>
><br>
><br>
<br>
<br>
<br></div></div>
<gc.log.gz><ATT00001..txt><br>
</blockquote>
<br>
<br>
</blockquote></div><br></div></div></div>
</blockquote></div><br></div>