CMSEdenChunksRecordAlways & CMSParallelInitialMarkEnabled

graham sanderson graham at vast.com
Thu Jun 19 19:22:08 UTC 2014


Ok, thanks…

The 3 before the 1129 and the 6 before the 11459 are the node numbers

I’ll dig around in the source for any way of finding out what the cause of safe points are (I’m not aware of a better -XX: option) … frankly I’ll probably just report this in the user hotspot-gc-use thread (it isn’t causing any real issues) and see if other people report it too.

Thanks,

Graham

On Jun 19, 2014, at 12:23 PM, Jon Masamitsu <jon.masamitsu at oracle.com> wrote:

> Graham,
> 
> I don't have any guesses about what is causing the difference in
> the number of safepoints.  As you note the messages you are
> counting are not specifically GC pauses.    The differences you
> are seeing are huge (1129 vs 11459, which, by the way, I don't 
> see those numbers in the tables; did you mean 31129 vs 611459?).
> If those numbers are really GC related I would expect some
> dramatic effect on the application behavior.
> 
> Sometimes better GC means applications run faster so
> maybe more work is getting done in nodes 4-6.
> The scale of the difference is surprising though.
> 
> Jon
> 
> On 6/18/2014 8:19 PM, graham sanderson wrote:
>> The options are working great and as expected (faster initial mark, and no long pauses after abortable preclean timeout). One weird thing though which I’m curious about: I’m showing some data for six JVMs (calling them nodes - they are on separate machines)
>> 
>> all with :
>> 
>> Linux version 2.6.32-431.3.1.el6.x86_64 (mockbuild at c6b10.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Fri Jan 3 21:39:27 UTC 2014
>> JDK 1.7.0_60-b19
>> 16 gig old gen
>> 8 gig new (6.4 gig eden)
>> -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC
>> -XX:+CMSParallelRemarkEnabled
>> 256 gig RAM
>> 16 cores (sandy bridge)
>> 
>> Nodes 4-6 also have 
>> 
>> -XX:+CMSEdenChunksRecordAlways
>> -XX:+CMSParallelInitialMarkEnabled
>> 
>> There are some application level config differences (which limit amount of certain objects kept in memory before flushing to disk) - 1&4 have the same app config, 2&5 have the same app config, 3&6 have the same app config
>> 
>> This first dump, shows two days worth of total times application threads were stopped via grepping logs for Total time for which application threads were stopped and summing the values. worst case 4 minutes over the day is not too bad, so this isn’t a big issue
>> 
>> 2014-06-17 : 1 154.623
>> 2014-06-17 : 2 90.3006
>> 2014-06-17 : 3 75.3602
>> 2014-06-17 : 4 180.618
>> 2014-06-17 : 5 107.668
>> 2014-06-17 : 6 99.7783
>> -------
>> 2014-06-18 : 1 190.741
>> 2014-06-18 : 2 82.8865
>> 2014-06-18 : 3 90.0098
>> 2014-06-18 : 4 239.702
>> 2014-06-18 : 5 149.332
>> 2014-06-18 : 6 138.03
>> 
>> Notably however if you look via JMX/visualGC, the total GC time is actually lower on nodes 4 to 6 than the equivalent nodes 1 to 3. Now I know that biased lock revocation and other things cause safe point, so I figure something other than GC must be the cause… so I just did a count of log lines with  Total time for which application threads were stopped and got this:
>> 
>> 2014-06-17 : 1 19282
>> 2014-06-17 : 2 6784
>> 2014-06-17 : 3 1275
>> 2014-06-17 : 4 26356
>> 2014-06-17 : 5 14491
>> 2014-06-17 : 6 8402
>> -------
>> 2014-06-18 : 1 20943
>> 2014-06-18 : 2 1134
>> 2014-06-18 : 3 1129
>> 2014-06-18 : 4 30289
>> 2014-06-18 : 5 16508
>> 2014-06-18 : 6 11459
>> 
>> I can’t cycle these nodes right now (to try each new parameter individually), but am curious whether you can think of why adding these parameters would have such a large effect on the number of safe point stops - e.g. 1129 vs 11459 for otherwise identically configured nodes with very similar workload. Note the ratio is highest on nodes 2 vs node 5 which spill the least into the old generation (so certainly fewer CMS cycles, and also fewer young gen collections) if that sparks any ideas.
>> 
>> Thanks,
>> 
>> Graham.
>> 
>> P.S. It is entirely possible I don’t know exactly what Total time for which application threads were stopped refers to in all cases (I’m assuming it is a safe point stop)
>> 
>> On Jun 16, 2014, at 1:54 PM, graham sanderson <graham at vast.com> wrote:
>> 
>>> Thanks Jon; that’s exactly what i was hoping
>>> 
>>> On Jun 16, 2014, at 1:27 PM, Jon Masamitsu <jon.masamitsu at oracle.com> wrote:
>>> 
>>>> 
>>>> On 06/12/2014 09:16 PM, graham sanderson wrote:
>>>>> Hi, I hope this is the right list for this question:
>>>>> 
>>>>> I was investigating abortable preclean timeouts in our app (and associated long remark pause) so had a look at the old jdk6 code I had on my box, wondered about recording eden chunks during certain eden slow allocation paths (I wasn’t sure if TLAB allocation is just a CAS bump), and saw what looked perfect in the latest code, so was excited to install 1.7.0_60-b19
>>>>> 
>>>>> I wanted to ask what you consider the stability of these two options to be (I’m pretty sure at least the first one is new in this release)
>>>>> 
>>>>> I have just installed locally on my mac, and am aware of http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8021809 which I could reproduce, however I wasn’t able to reproduce it without -XX:-UseCMSCompactAtFullCollection
>>>>>                               (is this your understanding too?)
>>>> 
>>>> Yes.
>>>> 
>>>>> 
>>>>> We are running our application with 8 gig young generation (6.4g eden), on boxes with 32 cores… so parallelism is good for short pauses
>>>>> 
>>>>> we already have
>>>>> 
>>>>> -XX:+UseParNewGC 
>>>>> -XX:+UseConcMarkSweepGC
>>>>> -XX:+CMSParallelRemarkEnabled
>>>>> 
>>>>> we have seen a few long(isn) initial marks, so 
>>>>> 
>>>>> -XX:+CMSParallelInitialMarkEnabled sounds good
>>>>> 
>>>>> as for 
>>>>> 
>>>>> -XX:+CMSEdenChunksRecordAlways
>>>>> 
>>>>> my question is: what constitutes a slow path such an eden chunk is potentially recorded… TLAB allocation, or more horrific things; basically (and I’ll test our app with -XX:+CMSPrintEdenSurvivorChunks) is it likely that I’ll actually get less samples using -XX:+CMSEdenChunksRecordAlways in a highly multithread app than I would with sampling, or put another way… what sort of app allocation patterns if any might avoid the slow path altogether and might leave me with just one chunk?
>>>> 
>>>> Fast path allocation is done from TLAB's.  If you have to get
>>>> a new TLAB, the call to get the new TLAB comes from compiled
>>>> code but the call is into the JVM and  that is the slow path where
>>>> the sampling is done.
>>>> 
>>>> Jon
>>>> 
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Graham
>>>>> 
>>>>> P.S. less relevant I think, but our old generation is 16g
>>>>> P.P.S. I suspect the abortable preclean timeouts mostly happen after a burst of very high allocation rate followed by an almost complete lull… this is one of the patterns that can happen in our application
>>>> 
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20140619/46f6ee2b/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1574 bytes
Desc: not available
URL: <https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20140619/46f6ee2b/smime.p7s>


More information about the hotspot-gc-dev mailing list