Looks good to me; ship it!

-- ramki

On 7/29/2011 4:13 PM, Kevin Walls wrote:

Actually I missed the other comment that we could go further and remove the unneeded constructor.

http://cr.openjdk.java.net/~kevinw/7072527/webrev.01/

Thanks
Kevin

On 29/07/11 17:35, Kevin Walls wrote:
Hi --

I got this ready in a webrev:

http://cr.openjdk.java.net/~kevinw/7072527/webrev.00/

The new test is passing, as are the old ones. 8-)

Also, this change means that a CMS cycle ending in concurrent mode failure now counts as one collection. One could argue that either way (should it be 2 collections?) - but I'm thinking if we are counting "completed" collections then we are now counting correctly!

Thanks
Kevin


On 29/07/11 02:08, Y. S. Ramakrishna wrote:
I filed: 7072527 CMS: JMM GC counters overcount in some cases

On 07/28/11 17:29, Y. S. Ramakrishna wrote:
Hi Kevin --

thanks for jumping on this! More inline below ...

On 07/28/11 09:33, Krystal Mok wrote:
Hi Kevin,

Thank you for taking care of this, and it's good to see the problem is verified.

I think whether or not the suggested fix is sufficient depends on what paths can reach CMSCollector::do_compaction_work(). If all paths that can reach CMSCollector::do_compaction_work() come from GenCollectedHeap::do_collection(), then the fix should be good to go. Otherwise it'll need a better workaround.

I believe all concurrent mode failures/interrupts (which includes the System.gc() case) does come from GenCollectedHeap::do_collection(), but I'm not exactly sure about this, could anybody please clarify on it?

Yes, i believe this is indeed the case, and my browsing of the
code using cscope seemed to confirm that belief.

More below ...


Regards,
Kris Mok

On Thu, Jul 28, 2011 at 10:12 PM, Kevin Walls <kevin.wa...@oracle.com <mailto:kevin.wa...@oracle.com>> wrote:

    __
    Hi --

6580448 was marked as a duplicate of 6581734, which fixed the fact that CMS collections were just not counted at all - with CMS, only a
    stop the world full gc would be counted in the stats.

    But looks like you're right...  Here is a quick variation of the
testcase from 6581734 which shows the same thing, and this verifies the same, and is solved by ExplicitGCInvokesConcurrent. If there is
    no other feedback I can test if the removal of the
    TraceCMSMemoryManagerStats() call in
    CMSCollector::do_compaction_work is all we need...


Kevin, yes, it would be great if you could verify this and push the fix.
I am not sure if the push would need to wait for the signing of OCA
from Kris, but best to check with Those Who Would Know Such Things.

Since the original CR has been closed, i'll open one momentarily and
can make you RE (if that's OK with you). I'll be happy to serve as
reviewer of the change.

As regards the jstat counter reporting two pauses per concurrent
CMS cycle, I am of two minds on what the original intention
was. I'd have originally regarded the double increment as a
bug, but as you state it is really two pauses, even if part of
a single cycle. And it makes sense to count them as two. I
agree that this should be documented and left alone, given
how long we have had this behaviour, and the alternative
(of counting cycles, rather than pauses) may be no better
(or arguably worse). There's actually an open CR for this
which we can redirect into a CR to update the relevant documentation.

-- ramki


    Regards
    Kevin


    /*
     * Copyright (c) 2011, Oracle and/or its affiliates. All rights
    reserved.
     * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
     *
* This code is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License version 2 only, as
     * published by the Free Software Foundation.
     *
* This code is distributed in the hope that it will be useful, but
    WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
     * version 2 for more details (a copy is included in the LICENSE
    file that
     * accompanied this code).
     *
* You should have received a copy of the GNU General Public License
    version
     * 2 along with this work; if not, write to the Free Software
    Foundation,
     * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
     *
     * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA
    94065 USA
     * or visit www.oracle.com <http://www.oracle.com> if you need
    additional information or have any
     * questions.
     */

    /*
     * @test TestFullGCount.java
     * @bug
     * @summary
     * @run main/othervm -XX:+UseConcMarkSweepGC TestFullGCCount
     *
     */
    import java.util.*;
    import java.lang.management.*;



    public class TestFullGCCount {

    private String poolName = "CMS";
    private String collectorName = "ConcurrentMarkSweep";

    public static void main(String [] args) {

        TestFullGCCount t = null;
        if (args.length==2) {
            t = new TestFullGCCount(args[0], args[1]);
        } else {
            System.out.println("Defaulting to monitor CMS pool and
    collector.");
            t = new TestFullGCCount();
        }
        t.run();

    }

    public TestFullGCCount(String pool, String collector) {
        poolName = pool;
        collectorName = collector;
    }
    public TestFullGCCount() {
    }

    public void run() {

        int count = 0;
        int iterations = 20;
        long counts[] = new long[iterations];
        boolean diffAlways2 = true; // assume we will fail

        for (int i=0; i<iterations; i++) {
            System.gc();
            counts[i] = checkStats();
            if (i>0) {
                if (counts[i] - counts[i-1] != 2) {
                    diffAlways2 = false;
                }
            }
        }
        if (diffAlways2) {
throw new RuntimeException("FAILED: difference in count is
    always 2.");
        }
        System.out.println("Passed.");
    }

    private long checkStats() {
        long count = 0;
        List<MemoryPoolMXBean> pools =
    ManagementFactory.getMemoryPoolMXBeans();
        List<GarbageCollectorMXBean> collectors =
    ManagementFactory.getGarbageCollectorMXBeans();
        for (int i=0; i<collectors.size(); i++) {
            GarbageCollectorMXBean collector = collectors.get(i);
            String name = collector.getName();
            if (name.contains(collectorName)) {
                System.out.println(name + ": collection count = "
                                   + collector.getCollectionCount());
                count = collector.getCollectionCount();
            }
        }
        return count;

    }

    }


    On 27/07/11 17:12, Krystal Mok wrote:
    Hi all,

    I've been looking at a strange inconsistency of full GC count
recorded by jvmstat and JMM counters. I'd like to know which ones of the following behaviors are by design, which ones are bugs, and which ones are just my misunderstanding. I apologize for making a
    short story long...

    =====================================================

    The counters involved:

    * A jvmstat counter named "sun.gc.collector.1.invocations" keeps
    track of the number of pauses occured as a result of a major
collection. It is used by utilities such as jstat as the source of "FGC" (full collection count), and the old gen collection count in
    Visual GC. It's updated by an TraceCollectorStats object.
* A JMM counter, GCMemoryManager::_num_collections, keeps track of the number of collections that have ended. This counter is used as
    HotSpot's implementation of the JMX
    GarbageCollectorMXBean.getCollectionCount(). It's updated by
    either a TraceMemoryManagerStats object or a
    TraceCMSMemoryManagerStats object.

To show the situation, I've made a screenshot of a VisualVM and a JConsole running side by side, both are monitoring the VisualVM's
    GC stats:
http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png (I'll upload the screenshot to somewhere else if anybody can't see it)
    The VisualVM instance is running on JDK6u26, with ParNew+CMS.
    In the screenshot, Visual GC reports that the old gen collection
    count is 20, while JConsole reports 10.

    I see that there was this bug:
    6580448: CMS: Full GC collection count mismatch between
    GarbageCollectorMXBean and jvmstat (VisualGC)
    I don't think the current implementation has a bug in the sense
    that the two counters don't report the same number.

    This behavior seems reasonable, but the naming of the value in
    these tools are confusing: both tools say "collections", but
apparently the number in Visual GC means "number of pauses" where
    as the number in JConsole means "number of collection cycles".
It'd be great if the difference could be documented somewhere, if
    that's the intended behavior.

    And then the buggy behavior. Code demo posted on gist:
    https://gist.github.com/1106263
    Starting from JDK6u23, when using CMS without
    ExplicitGCInvokesConcurrent, System.gc() (or
    Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX) would
    make the JMM GC counter increment by 2 per invocation, while the
jvmstat counter is only incremented by 1. I believe the latter is
    correct and the former needs some fixing.

    =====================================================

    My understanding of the behavior shown above:

    1. The concurrent GC part:

    There are 2 pauses in a CMS concurrent GC cycle, one in the
    initial mark phase, and one in the final remark phase.
    To trigger a concurrent GC cycle, the CMS thread wakes up
periodically to see if it shouldConcurrentCollect(), and trigger a
    cycle when the predicate returned true, or goes back to sleep if
    the predicate returned false. The whole concurrent GC cycle
    doesn't go through GenCollectedHeap::do_collection().

    The jvmstat counter for old gen pauses is updated in
    CMSCollector::do_CMS_operation(CMS_op_type op), which exactly
    covers both pause phases.

    The JMM counter, however, is updated in the concurrent sweep
    phase, CMSCollector::sweep(bool asynch), if there was no
    concurrent mode failure; or it is updated in
CMSCollector::do_compaction_work(bool clear_all_soft_refs) in case
    of a bailout due to concurrent mode failure (advertised as so in
the code comments). So that's an increment by 1 per concurrent GC cycle, which does reflect the "number of collection cycles", fair
    enough.

    So far so good.

    2. The System.gc() part:

    Without ExplicitGCInvokesConcurrent set, System.gc() does a
    stop-the-world full GC, which consists of only one pause, so
"number of pauses" would equal "number of collections" in this case.
    It will go into GenCollectedHeap::do_collection(); both the
    jvmstat and the JMM GC counter gets incremented by 1 here,

    TraceCollectorStats tcs(_gens[i]->counters());
    TraceMemoryManagerStats tmms(_gens[i]->kind());

    But, drilling down into:
    _gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);

    That'll eventually go into:
    CMSCollector::acquire_control_and_collect(bool full, bool
    clear_all_soft_refs)

    System.gc() is user requested so that'll go further into
    mark-sweep-compact:
    CMSCollector::do_compaction_work(bool clear_all_soft_refs)
And here, it increments the JMM GC counter again (remember it was
    in the concurrent GC path too, to handle bailouts), even though
    this is still in the same collection. This leads to the "buggy
    behavior" mentioned earlier.

    The JMM GC counter wasn't added to CMS until this fix got in:
6581734: CMS Old Gen's collection usage is zero after GC which is
    incorrect

The code added to CMSCollector::do_compaction_work() works fine in
    the concurrent GC path, but interacts badly with the original
    logic in GenCollectedHeap::do_collection().

    =====================================================

    I thought all concurrent mode failures/interrupts come from
    GenCollectedHeap::do_collection(). If that's the case, then it
    seems unnecessary to update the JMM GC counter in
CMSCollector::do_compaction_work(), simply removing it should fix
    the problem.

With that, I'd purpose the following (XS) change: (diff against HS20)

    diff -r f0f676c5a2c6
src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
    ---
a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
         Tue Mar 15 19:30:16 2011 -0700
    +++
b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
         Thu Jul 28 00:02:41 2011 +0800
    @@ -2022,9 +2022,6 @@
_intra_sweep_estimate.padded_average());
       }

    -  {
    -    TraceCMSMemoryManagerStats();
    -  }
       GenMarkSweep::invoke_at_safepoint(_cmsGen->level(),
         ref_processor(), clear_all_soft_refs);
       #ifdef ASSERT

    The same goes for the changes in:
    7036199: Adding a notification to the implementation of
    GarbageCollectorMXBeans

    =====================================================

P.S. Is there an "official" name for the counters that I referred to as "jvmstat counters" above? Is it just jvmstat, or PerfData or
    HSPERFDATA?




Reply via email to