Performance bug in and XXS patch for Parallel Old GC scanning large object arrays

Reingruber, Richard richard.reingruber at sap.com
Wed Mar 2 12:26:44 UTC 2016


Hi,

I would like to report a performance bug in the parallel old collector handling
large object arrays and contribute an XXS patch for the bug that improves gc pauses by
5x in a microbenchmark. The problem is that scanning of large object arrays is
not really parallelized.

When marking through large object arrays, the elements are scanned in chunks of
ObjArrayMarkingStride. This is done in

ObjArrayKlass::objarray_follow_contents(ParCompactionManager* cm, oop obj, int index)

In particular, the partially scanned array is pushed on the _objarray_stack only
*after* the elements are marked and pushed on the marking stack. It would help
parallelizing the work, if the array was made available to other threads (by
pushing it on _objarray_stack) *before* the current thread visits its stride of
elements.

diff -r 5c4f8192021e src/share/vm/gc/parallel/psCompactionManager.inline.hpp
--- a/src/share/vm/gc/parallel/psCompactionManager.inline.hpp	Tue Mar 01 09:42:19 2016 +0100
+++ b/src/share/vm/gc/parallel/psCompactionManager.inline.hpp	Wed Mar 02 12:17:07 2016 +0100
@@ -125,14 +125,14 @@
   T* const beg = base + beg_index;
   T* const end = base + end_index;
 
+  if (end_index < len) {
+    cm->push_objarray(obj, end_index); // Push the continuation.
+  }
+
   // Push the non-NULL elements of the next stride on the marking stack.
   for (T* e = beg; e < end; e++) {
     cm->mark_and_push<T>(e);
   }
-
-  if (end_index < len) {
-    cm->push_objarray(obj, end_index); // Push the continuation.
-  }
 }
 
 inline void ParCompactionManager::follow_contents(objArrayOop obj, int index) {

Below you can find a microbenchmark that demonstrates the performance bug and
the output of 2 runs without and with the patch on a linux x86_64 box with 32 hw
threads and 16 cores.

I'd suggest as well to reduce ObjArrayMarkingStride (512 by dflt) for the
parallel old gc for better parallelization. G1 and the parallel scavenger for
example use ParGCArrayScanChunk of 50). Both push the remainder *before*
iterating the array.

Regards,
Richard.

--
Richard Reingruber | SAP JVM | CORE PLATFORM,  SAP SE 
Pflichtangaben/Mandatory Disclosure Statements: http://www.sap.com/company/legal/impressum.epx  

###
### Benchmark Run 1 without the path
###


$ ./jdk9/bin/java -showversion -Xms8g -Xmx8g -XX:+UseParallelOldGC -cp /usr/work/d038402/jtests/QuickJavaTests/bin test.gc.TestLargeObjectArray 20 100000 4
openjdk version "9-internal"
OpenJDK Runtime Environment (build 9-internal+0-2016-03-01-165700.d038402.hs-rt)
OpenJDK 64-Bit Server VM (build 9-internal+0-2016-03-01-165700.d038402.hs-rt, mixed mode)

Building large array with 100000 elements of small trees with depth 4
Heap stats: free:7769 MB (99.0 %)  max:7851 MB
done
Heap stats: free:6417 MB (81.7 %)  max:7851 MB
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 8759 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 1856 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 3657 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 4280 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 1823 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 2439 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 1803 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 1675 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 1889 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 1867 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 1854 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 1771 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 1792 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 1681 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 1844 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 1807 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 1832 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 1835 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 1953 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 1846 ms
d038402 at ld3357 [/net/usr.work/d038402/tmp] 17:55:04

###
### Benchmark Run 2 without the path
###

$ ./jdk9_mod/bin/java -showversion -Xms8g -Xmx8g -XX:+UseParallelOldGC -cp /usr/work/d038402/jtests/QuickJavaTests/bin test.gc.TestLargeObjectArray 20 100000 4
openjdk version "9-internal"
OpenJDK Runtime Environment (build 9-internal+0-2016-03-01-165700.d038402.hs-rt)
OpenJDK 64-Bit Server VM (build 9-internal+0-2016-03-01-165700.d038402.hs-rt, mixed mode)

Building large array with 100000 elements of small trees with depth 4
Heap stats: free:7769 MB (99.0 %)  max:7851 MB
done
Heap stats: free:6376 MB (81.2 %)  max:7851 MB
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 3684 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 1073 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 425 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 369 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 436 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 334 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 352 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 420 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 391 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 393 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 356 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 352 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 348 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 354 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 340 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 370 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 363 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 370 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 355 ms
Calling System.gc()
Heap stats: free:6467 MB (82.4 %)  max:7851 MB
Full GC duration: 348 ms

###
### Benchmark source code
###

package test.gc;

public class TestLargeObjectArray {

    private static final long MB = 1 << 20;
    private int arraySize;
    private int treeDepth;
    private int iterations;
    private LittleTree[] arrayOfSmallTrees;

    public static void main(String[] args) {
        TestLargeObjectArray test = new TestLargeObjectArray();
        test.parseCmdLine(args);
        test.buildLargeArrayOfSmallTrees();
        test.run();
    }

    private void run() {
        while (iterations-- > 0) {
            long start = System.currentTimeMillis();
            message("Calling System.gc()");
            System.gc();
            long end = System.currentTimeMillis();
            heapStats();
            message("Full GC duration: " + (end - start) + " ms");
        }
    }

    public void buildLargeArrayOfSmallTrees() {
        try {
            message("Building large array with " + arraySize + " elements of small trees with depth " + treeDepth);
            heapStats();
            arrayOfSmallTrees = new LittleTree[arraySize];
            for (int i = 0; i < arraySize; i++) {
                arrayOfSmallTrees[i] = LittleTree.build(treeDepth);
            }
            message("done");
            heapStats();
        } catch (OutOfMemoryError oom) {
            arrayOfSmallTrees = null;
            message("OutOfMemory: please reduce array size or tree depth");
            System.exit(1);
        }
    }

    private void heapStats() {
        long free = Runtime.getRuntime().freeMemory();
        long max = Runtime.getRuntime().maxMemory();
        float freePercentage = round((float) free / (float) max * 100);
        message("Heap stats: free:" + free / MB + " MB (" + freePercentage + " %)" + "  max:" + max / MB + " MB");
    }

    private float round(float f) {
        return Math.round(f * 10) / 10f;
    }

    private void parseCmdLine(String[] args) {
        if (args.length != 3) {
            message();
            message("usage: java " + TestLargeObjectArray.class.getName()
                    + " <test iterations, e.g. 20> <large array size, e.g. 50000> <small tree depth, e.g. 4>");
            message();
            System.exit(1);
        }
        int i = 0;
        iterations = Integer.parseInt(args[i++]);
        arraySize = Integer.parseInt(args[i++]);
        treeDepth = Integer.parseInt(args[i++]);
    }

    private void message(String msg) {
        System.out.println(msg);
    }

    private void message() {
        System.out.println();
    }
}

class LittleTree {

    private static final int ARITY = 6;
    @SuppressWarnings("unused")
    private LittleTree[] subtrees;

    public LittleTree(LittleTree[] trees) {
        this.subtrees = trees;
    }

    public static LittleTree build(int treeDepth) {
        if (treeDepth == 0) {
            return null;
        }

        // recursion
        LittleTree[] subtrees = new LittleTree[ARITY];
        for (int i = 0; i < ARITY; i++) {
            subtrees[i] = build(treeDepth - 1);
        }
        return new LittleTree(subtrees);
    }
}


More information about the hotspot-gc-dev mailing list