Review request: 7072527 CMS: JMM GC counters overcount in some cases

Ramki Ramakrishna y.s.ramakrishna at oracle.com
Mon Aug 1 11:35:33 PDT 2011


Looks good to me; ship it!

-- ramki

On 7/29/2011 4:13 PM, Kevin Walls wrote:
>
> Actually I missed the other comment that we could go further and 
> remove the unneeded constructor.
>
> http://cr.openjdk.java.net/~kevinw/7072527/webrev.01/
>
> Thanks
> Kevin
>
> On 29/07/11 17:35, Kevin Walls wrote:
>> Hi --
>>
>> I got this ready in a webrev:
>>
>> http://cr.openjdk.java.net/~kevinw/7072527/webrev.00/
>>
>> The new test is passing, as are the old ones. 8-)
>>
>> Also, this change means that a CMS cycle ending in concurrent mode 
>> failure now counts as one collection.  One could argue that either 
>> way (should it be 2 collections?) - but I'm thinking if we are 
>> counting "completed" collections then we are now counting correctly!
>>
>> Thanks
>> Kevin
>>
>>
>> On 29/07/11 02:08, Y. S. Ramakrishna wrote:
>>> I filed: 7072527 CMS: JMM GC counters overcount in some cases
>>>
>>> On 07/28/11 17:29, Y. S. Ramakrishna wrote:
>>>> Hi Kevin --
>>>>
>>>> thanks for jumping on this! More inline below ...
>>>>
>>>> On 07/28/11 09:33, Krystal Mok wrote:
>>>>> Hi Kevin,
>>>>>
>>>>> Thank you for taking care of this, and it's good to see the 
>>>>> problem is verified.
>>>>>
>>>>> I think whether or not the suggested fix is sufficient depends on 
>>>>> what paths can reach CMSCollector::do_compaction_work(). If all 
>>>>> paths that can reach CMSCollector::do_compaction_work() come from 
>>>>> GenCollectedHeap::do_collection(), then the fix should be good to 
>>>>> go. Otherwise it'll need a better workaround.
>>>>>
>>>>> I believe all concurrent mode failures/interrupts (which includes 
>>>>> the System.gc() case) does come from 
>>>>> GenCollectedHeap::do_collection(), but I'm not exactly sure about 
>>>>> this, could anybody please clarify on it?
>>>>
>>>> Yes, i believe this is indeed the case, and my browsing of the
>>>> code using cscope seemed to confirm that belief.
>>>>
>>>> More below ...
>>>>
>>>>>
>>>>> Regards,
>>>>> Kris Mok
>>>>>
>>>>> On Thu, Jul 28, 2011 at 10:12 PM, Kevin Walls 
>>>>> <kevin.walls at oracle.com <mailto:kevin.walls at oracle.com>> wrote:
>>>>>
>>>>>     __
>>>>>     Hi --
>>>>>
>>>>>     6580448 was marked as a duplicate of 6581734, which fixed the 
>>>>> fact
>>>>>     that CMS collections were just not counted at all - with CMS, 
>>>>> only a
>>>>>     stop the world full gc would be counted in the stats.
>>>>>
>>>>>     But looks like you're right...  Here is a quick variation of the
>>>>>     testcase from 6581734 which shows the same thing, and this 
>>>>> verifies
>>>>>     the same, and is solved by ExplicitGCInvokesConcurrent.  If 
>>>>> there is
>>>>>     no other feedback I can test if the removal of the
>>>>>     TraceCMSMemoryManagerStats() call in
>>>>>     CMSCollector::do_compaction_work is all we need...
>>>>
>>>>
>>>> Kevin, yes, it would be great if you could verify this and push the 
>>>> fix.
>>>> I am not sure if the push would need to wait for the signing of OCA
>>>> from Kris, but best to check with Those Who Would Know Such Things.
>>>>
>>>> Since the original CR has been closed, i'll open one momentarily and
>>>> can make you RE (if that's OK with you). I'll be happy to serve as
>>>> reviewer of the change.
>>>>
>>>> As regards the jstat counter reporting two pauses per concurrent
>>>> CMS cycle, I am of two minds on what the original intention
>>>> was. I'd have originally regarded the double increment as a
>>>> bug, but as you state it is really two pauses, even if part of
>>>> a single cycle. And it makes sense to count them as two. I
>>>> agree that this should be documented and left alone, given
>>>> how long we have had this behaviour, and the alternative
>>>> (of counting cycles, rather than pauses) may be no better
>>>> (or arguably worse). There's actually an open CR for this
>>>> which we can redirect into a CR to update the relevant documentation.
>>>>
>>>> -- ramki
>>>>
>>>>>
>>>>>     Regards
>>>>>     Kevin
>>>>>
>>>>>
>>>>>     /*
>>>>>      * Copyright (c) 2011, Oracle and/or its affiliates. All rights
>>>>>     reserved.
>>>>>      * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>>>>>      *
>>>>>      * This code is free software; you can redistribute it and/or 
>>>>> modify it
>>>>>      * under the terms of the GNU General Public License version 2 
>>>>> only, as
>>>>>      * published by the Free Software Foundation.
>>>>>      *
>>>>>      * This code is distributed in the hope that it will be 
>>>>> useful, but
>>>>>     WITHOUT
>>>>>      * ANY WARRANTY; without even the implied warranty of 
>>>>> MERCHANTABILITY or
>>>>>      * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General 
>>>>> Public License
>>>>>      * version 2 for more details (a copy is included in the LICENSE
>>>>>     file that
>>>>>      * accompanied this code).
>>>>>      *
>>>>>      * You should have received a copy of the GNU General Public 
>>>>> License
>>>>>     version
>>>>>      * 2 along with this work; if not, write to the Free Software
>>>>>     Foundation,
>>>>>      * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
>>>>>      *
>>>>>      * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA
>>>>>     94065 USA
>>>>>      * or visit www.oracle.com <http://www.oracle.com> if you need
>>>>>     additional information or have any
>>>>>      * questions.
>>>>>      */
>>>>>
>>>>>     /*
>>>>>      * @test TestFullGCount.java
>>>>>      * @bug
>>>>>      * @summary
>>>>>      * @run main/othervm -XX:+UseConcMarkSweepGC TestFullGCCount
>>>>>      *
>>>>>      */
>>>>>     import java.util.*;
>>>>>     import java.lang.management.*;
>>>>>
>>>>>
>>>>>
>>>>>     public class TestFullGCCount {
>>>>>
>>>>>     private String poolName = "CMS";
>>>>>     private String collectorName = "ConcurrentMarkSweep";
>>>>>
>>>>>     public static void main(String [] args) {
>>>>>
>>>>>         TestFullGCCount t = null;
>>>>>         if (args.length==2) {
>>>>>             t = new TestFullGCCount(args[0], args[1]);
>>>>>         } else {
>>>>>             System.out.println("Defaulting to monitor CMS pool and
>>>>>     collector.");
>>>>>             t = new TestFullGCCount();
>>>>>         }
>>>>>         t.run();
>>>>>
>>>>>     }
>>>>>
>>>>>     public TestFullGCCount(String pool, String collector) {
>>>>>         poolName = pool;
>>>>>         collectorName = collector;
>>>>>     }
>>>>>     public TestFullGCCount() {
>>>>>     }
>>>>>
>>>>>     public void run() {
>>>>>
>>>>>         int count = 0;
>>>>>         int iterations = 20;
>>>>>         long counts[] = new long[iterations];
>>>>>         boolean diffAlways2 = true; // assume we will fail
>>>>>
>>>>>         for (int i=0; i<iterations; i++) {
>>>>>             System.gc();
>>>>>             counts[i] = checkStats();
>>>>>             if (i>0) {
>>>>>                 if (counts[i] - counts[i-1] != 2) {
>>>>>                     diffAlways2 = false;
>>>>>                 }
>>>>>             }
>>>>>         }
>>>>>         if (diffAlways2) {
>>>>>             throw new RuntimeException("FAILED: difference in 
>>>>> count is
>>>>>     always 2.");
>>>>>         }
>>>>>         System.out.println("Passed.");
>>>>>     }
>>>>>
>>>>>     private long checkStats() {
>>>>>         long count = 0;
>>>>>         List<MemoryPoolMXBean> pools =
>>>>>     ManagementFactory.getMemoryPoolMXBeans();
>>>>>         List<GarbageCollectorMXBean> collectors =
>>>>>     ManagementFactory.getGarbageCollectorMXBeans();
>>>>>         for (int i=0; i<collectors.size(); i++) {
>>>>>             GarbageCollectorMXBean collector = collectors.get(i);
>>>>>             String name = collector.getName();
>>>>>             if (name.contains(collectorName)) {
>>>>>                 System.out.println(name + ": collection count = "
>>>>>                                    + collector.getCollectionCount());
>>>>>                 count = collector.getCollectionCount();
>>>>>             }
>>>>>         }
>>>>>         return count;
>>>>>
>>>>>     }
>>>>>
>>>>>     }
>>>>>
>>>>>
>>>>>     On 27/07/11 17:12, Krystal Mok wrote:
>>>>>>     Hi all,
>>>>>>
>>>>>>     I've been looking at a strange inconsistency of full GC count
>>>>>>     recorded by jvmstat and JMM counters. I'd like to know which 
>>>>>> ones
>>>>>>     of the following behaviors are by design, which ones are 
>>>>>> bugs, and
>>>>>>     which ones are just my misunderstanding. I apologize for 
>>>>>> making a
>>>>>>     short story long...
>>>>>>
>>>>>>     =====================================================
>>>>>>
>>>>>>     The counters involved:
>>>>>>
>>>>>>     * A jvmstat counter named "sun.gc.collector.1.invocations" keeps
>>>>>>     track of the number of pauses occured as a result of a major
>>>>>>     collection. It is used by utilities such as jstat as the 
>>>>>> source of
>>>>>>     "FGC" (full collection count), and the old gen collection 
>>>>>> count in
>>>>>>     Visual GC. It's updated by an TraceCollectorStats object.
>>>>>>     * A JMM counter, GCMemoryManager::_num_collections, keeps 
>>>>>> track of
>>>>>>     the number of collections that have ended. This counter is 
>>>>>> used as
>>>>>>     HotSpot's implementation of the JMX
>>>>>>     GarbageCollectorMXBean.getCollectionCount(). It's updated by
>>>>>>     either a TraceMemoryManagerStats object or a
>>>>>>     TraceCMSMemoryManagerStats object.
>>>>>>
>>>>>>     To show the situation, I've made a screenshot of a VisualVM 
>>>>>> and a
>>>>>>     JConsole running side by side, both are monitoring the 
>>>>>> VisualVM's
>>>>>>     GC stats:
>>>>>>     
>>>>>> http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png 
>>>>>>
>>>>>>     (I'll upload the screenshot to somewhere else if anybody 
>>>>>> can't see it)
>>>>>>     The VisualVM instance is running on JDK6u26, with ParNew+CMS.
>>>>>>     In the screenshot, Visual GC reports that the old gen collection
>>>>>>     count is 20, while JConsole reports 10.
>>>>>>
>>>>>>     I see that there was this bug:
>>>>>>     6580448: CMS: Full GC collection count mismatch between
>>>>>>     GarbageCollectorMXBean and jvmstat (VisualGC)
>>>>>>     I don't think the current implementation has a bug in the sense
>>>>>>     that the two counters don't report the same number.
>>>>>>
>>>>>>     This behavior seems reasonable, but the naming of the value in
>>>>>>     these tools are confusing: both tools say "collections", but
>>>>>>     apparently the number in Visual GC means "number of pauses" 
>>>>>> where
>>>>>>     as the number in JConsole means "number of collection cycles".
>>>>>>     It'd be great if the difference could be documented 
>>>>>> somewhere, if
>>>>>>     that's the intended behavior.
>>>>>>
>>>>>>     And then the buggy behavior. Code demo posted on gist:
>>>>>>     https://gist.github.com/1106263
>>>>>>     Starting from JDK6u23, when using CMS without
>>>>>>     ExplicitGCInvokesConcurrent, System.gc() (or
>>>>>>     Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX) would
>>>>>>     make the JMM GC counter increment by 2 per invocation, while the
>>>>>>     jvmstat counter is only incremented by 1. I believe the 
>>>>>> latter is
>>>>>>     correct and the former needs some fixing.
>>>>>>
>>>>>>     =====================================================
>>>>>>
>>>>>>     My understanding of the behavior shown above:
>>>>>>
>>>>>>     1. The concurrent GC part:
>>>>>>
>>>>>>     There are 2 pauses in a CMS concurrent GC cycle, one in the
>>>>>>     initial mark phase, and one in the final remark phase.
>>>>>>     To trigger a concurrent GC cycle, the CMS thread wakes up
>>>>>>     periodically to see if it shouldConcurrentCollect(), and 
>>>>>> trigger a
>>>>>>     cycle when the predicate returned true, or goes back to sleep if
>>>>>>     the predicate returned false. The whole concurrent GC cycle
>>>>>>     doesn't go through GenCollectedHeap::do_collection().
>>>>>>
>>>>>>     The jvmstat counter for old gen pauses is updated in
>>>>>>     CMSCollector::do_CMS_operation(CMS_op_type op), which exactly
>>>>>>     covers both pause phases.
>>>>>>
>>>>>>     The JMM counter, however, is updated in the concurrent sweep
>>>>>>     phase, CMSCollector::sweep(bool asynch), if there was no
>>>>>>     concurrent mode failure; or it is updated in
>>>>>>     CMSCollector::do_compaction_work(bool clear_all_soft_refs) in 
>>>>>> case
>>>>>>     of a bailout due to concurrent mode failure (advertised as so in
>>>>>>     the code comments). So that's an increment by 1 per 
>>>>>> concurrent GC
>>>>>>     cycle, which does reflect the "number of collection cycles", 
>>>>>> fair
>>>>>>     enough.
>>>>>>
>>>>>>     So far so good.
>>>>>>
>>>>>>     2. The System.gc() part:
>>>>>>
>>>>>>     Without ExplicitGCInvokesConcurrent set, System.gc() does a
>>>>>>     stop-the-world full GC, which consists of only one pause, so
>>>>>>     "number of pauses" would equal "number of collections" in 
>>>>>> this case.
>>>>>>     It will go into GenCollectedHeap::do_collection(); both the
>>>>>>     jvmstat and the JMM GC counter gets incremented by 1 here,
>>>>>>
>>>>>>     TraceCollectorStats tcs(_gens[i]->counters());
>>>>>>     TraceMemoryManagerStats tmms(_gens[i]->kind());
>>>>>>
>>>>>>     But, drilling down into:
>>>>>>     _gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);
>>>>>>
>>>>>>     That'll eventually go into:
>>>>>>     CMSCollector::acquire_control_and_collect(bool full, bool
>>>>>>     clear_all_soft_refs)
>>>>>>
>>>>>>     System.gc() is user requested so that'll go further into
>>>>>>     mark-sweep-compact:
>>>>>>     CMSCollector::do_compaction_work(bool clear_all_soft_refs)
>>>>>>     And here, it increments the JMM GC counter again (remember it 
>>>>>> was
>>>>>>     in the concurrent GC path too, to handle bailouts), even though
>>>>>>     this is still in the same collection. This leads to the "buggy
>>>>>>     behavior" mentioned earlier.
>>>>>>
>>>>>>     The JMM GC counter wasn't added to CMS until this fix got in:
>>>>>>     6581734: CMS Old Gen's collection usage is zero after GC 
>>>>>> which is
>>>>>>     incorrect
>>>>>>
>>>>>>     The code added to CMSCollector::do_compaction_work() works 
>>>>>> fine in
>>>>>>     the concurrent GC path, but interacts badly with the original
>>>>>>     logic in GenCollectedHeap::do_collection().
>>>>>>
>>>>>>     =====================================================
>>>>>>
>>>>>>     I thought all concurrent mode failures/interrupts come from
>>>>>>     GenCollectedHeap::do_collection(). If that's the case, then it
>>>>>>     seems unnecessary to update the JMM GC counter in
>>>>>>     CMSCollector::do_compaction_work(), simply removing it should 
>>>>>> fix
>>>>>>     the problem.
>>>>>>
>>>>>>     With that, I'd purpose the following (XS) change: (diff 
>>>>>> against HS20)
>>>>>>
>>>>>>     diff -r f0f676c5a2c6
>>>>>>     
>>>>>> src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp 
>>>>>>
>>>>>>     ---
>>>>>>     
>>>>>> a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp 
>>>>>>
>>>>>>          Tue Mar 15 19:30:16 2011 -0700
>>>>>>     +++
>>>>>>     
>>>>>> b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp 
>>>>>>
>>>>>>          Thu Jul 28 00:02:41 2011 +0800
>>>>>>     @@ -2022,9 +2022,6 @@
>>>>>>                                                     
>>>>>> _intra_sweep_estimate.padded_average());
>>>>>>        }
>>>>>>
>>>>>>     -  {
>>>>>>     -    TraceCMSMemoryManagerStats();
>>>>>>     -  }
>>>>>>        GenMarkSweep::invoke_at_safepoint(_cmsGen->level(),
>>>>>>          ref_processor(), clear_all_soft_refs);
>>>>>>        #ifdef ASSERT
>>>>>>
>>>>>>     The same goes for the changes in:
>>>>>>     7036199: Adding a notification to the implementation of
>>>>>>     GarbageCollectorMXBeans
>>>>>>
>>>>>>     =====================================================
>>>>>>
>>>>>>     P.S. Is there an "official" name for the counters that I 
>>>>>> referred
>>>>>>     to as "jvmstat counters" above? Is it just jvmstat, or 
>>>>>> PerfData or
>>>>>>     HSPERFDATA?
>>>>>
>>>>>
>>
>


More information about the hotspot-gc-dev mailing list