RFR(S) 8214388 CDS dumping fails with java heap fragmentation

Ioi Lam ioi.lam at oracle.com
Fri Nov 30 01:07:42 UTC 2018


http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v01/
https://bugs.openjdk.java.net/browse/JDK-8214388


Symptom:
========

"java -Xshare:dump" would intermittently fail with

    Unable to write archive heap ... due to fragmentation.

This usually happens when you try to dump many classes (e.g. 10000) with
a relatively small heap (e.g., 1g) with a lot of GC threads (e.g., 24).

(Example use case -- Eclipse IDE loads 15,000 classes with 512MB heap.)

When GC happens during class loading, some old G1 regions  may be placed
at the top end of the heap (due to large number of GC threads).

Later, when writing the archived heap, G1 tries to allocate contiguous
regions from the top end of the heap. This would fail due to the presence
of those old regions.


Fix:
====

As suggested by Stefan Johansson, we run a full GC with a single GC
thread. This guarantees that all old blocks will be moved to the bottom
end of the heap.

Because there's no API for specifying the number of GC threads dynamically,
and CDS dump time doesn't allocates lots of objects, I have statically 
forced
the number of threads to 1 in AdaptiveSizePolicy::calc_active_workers during
CDS dump time.

(This seems like a more direct way than assigning ParallelGCThreads ...)


Notes:
======

1. Humongous regions cannot move. However, currently we don't do humongous
    allocations during CDS dump, so we should be fine. I have added 
diagnostics
    warnings so if fragmentation does happen in the future, the user can
    find out why.

2. Fixed a minor bug in 
HeapShared::check_closed_archive_heap_region_object_class

3. Fixed a bug in MetaspaceShared::read_extra_data, where the symbol/strings
    would be lost due to GC.

4. Added stress test to successfully archive about 18MB of objects with 
-Xmx64m.
    This used to fail even with -Xmx512m on a Solaris box.

5. With default CDS archive generation during JDK build time, -Xmx128m 
is used.
    Before this fix, the EDEN region lives at the top of the heap during 
CDS dump
    time, and we end up with a 2MB gap between the archive regions and 
the top
    of the heap. Because the archive regions cannot move, at run time, 
using CDS
    would reduce the max humongous allocation by 2MB.

    With this fix, the archive regions are now placed at the very top of 
the heap,
    so the gap no longer exists.


Tests:
======

Running hs-tiers{1-6} for sanity.

Thanks
- Ioi




More information about the hotspot-gc-dev mailing list