RFR: JDK-8061259: ParNew promotion failed is serialized on a lock

Bengt Rutisson bengt.rutisson at oracle.com
Mon Oct 27 09:11:59 UTC 2014


Hi Jungwoo,


On 2014-10-22 23:03, Jungwoo Ha wrote:
> Bug: https://bugs.openjdk.java.net/browse/JDK-8061259
>
> hotspot code: http://cr.openjdk.java.net/~rasbold/8061259/webrev.00/ 
> <http://cr.openjdk.java.net/%7Erasbold/8061259/webrev.00/>
>
> I created the patch for JDK9, but I can also observe this on JDK7 and 8.
>
> We are seeing several cases where GC worker threads are serialized on 
> GCRareEvent_lock
> causing 2 digit seconds pause on moderate sized heap.
>
> I have a test code that can reproduce it and shows it is solved.
>
> import java.util.LinkedList;
> class PromoFail {
>
>   static class Container {
>
>     Container p;
>     byte[] a;
>
>     public Container(int size) {
>       if (size > 0) {
>         p = new Container(size / 2);
>       } else {
>         p = null;
>       }
>       a = new byte[size];
>     }
>   }
>
>   public static void main(String args[]) {
>     if (args.length < 1) {
> System.err.println("@ 1st argument must be size in MB.");
> System.exit(1);
>     }
>     int size = 0;
>     try {
>       size = Integer.parseInt(args[0]) * 1024 * 1024;
>     } catch (NumberFormatException e) {
> System.err.println("@ Cannot parse the size(=" + args[0] + ")");
> System.exit(1);
>     }
>
>     // LinkedList will have more unbalanced workload.
> LinkedList<Container> list = new LinkedList<Container>();
>
>     // 1st iteration adds element without removal.
>     // These are all live objects.
>     for (int i = 0; i < size / 4; i++) {
>       list.add(new Container(1));
>     }
>     // Promote to the old gen.
>     System.gc();
>
>     for (int containerSize = 2; container_size < 512; container_size 
> *= 3) {
>       for (int i = 0; i < size / 4; i++) {
>         // Most likely removing an old object due to System.gc() from 
> previous iteration.
>         // This will cause fragmentation.
> list.remove();
> list.add(new Container(containerSize));
>       }
>
>       {
> System.gc();
>         Runtime runtime = Runtime.getRuntime();
> System.out.println("@ Current Used: "
>             + (runtime.totalMemory() - runtime.freeMemory()) / 1024 / 
> 1024);
>       }
>     }
>   }
> }
>
> You can run it with the following parameters.
>
> $ java -Xmx2g -Xms2g -Xmn1g -XX:+UseCMSFastPromotionFailure 
> -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=6 
> PromoFail 4
>
> Without UseCMSFastPromotionFailure
> #7: [GC (Allocation Failure) #7: [ParNew#6: 
> [CMS-concurrent-abortable-preclean: 0.003/0.203 secs] [Times: 
> user=0.20 sys=0.20 real=0.20 secs]
>  (promotion failed): 838912K->943744K(943744K), 62.0419534 secs]#8: 
> [CMS (concurrent mode failure): 1048441K->1048575K(1048576K), 
> 1.7731336 secs] 1609551K->1170596K(1992320K), [Metaspace: 
> 3547K->3547K(1056768K)], *63.8151607*secs] [Times: user=93.50 
> sys=22.12 real=63.82 secs]
>
> With UseCMSFastPromotionFailure
> #7: [GC (Allocation Failure) #7: [ParNew#6: 
> [CMS-concurrent-abortable-preclean: 0.004/0.204 secs] [Times: 
> user=0.30 sys=0.02 real=0.20 secs]
>  (promotion failed): 838912K->943744K(943744K), 2.0949545 secs]#8: 
> [CMS (concurrent mode failure): 1048363K->1048575K(1048576K), 
> 1.7517250 secs] 1609551K->1170595K(1992320K), [Metaspace: 
> 3546K->3546K(1056768K)], *3.8467384*secs] [Times: user=10.85 sys=1.04 
> real=3.85 secs]

Thanks for fixing this and for providing the nice reproducer. I get 
similar results when I run the reproducer on my workstation. Nice work!

I've looked a bit at the webrev. A couple of comments:

Why do you use OrderAccess methods for writing and reading the 
_has_promo_failed flag in has_promo_failed() and set_promot_failed() ?

Can we write out the full word "promotion" instead of just "promo" in 
the variables and methods?

Can we change the name of the flag from UseCMSFastPromotionFailure to 
CMSFastPromotionFailure? Most CMS flags start with CMS and I don't think 
we need the "Use" prefix.

What do you think about making the flag true by default? At least for 
JDK 9. If we decide to backport to JDK 8 or 7 it might be a good idea to 
keep the default value as false.

Did you find the information provided by _fast_promo_failure_hitcount 
useful for debugging? If it not too useful I would consider removing it 
since it is cluttering up the code a bit.


>
> I also ran it on Dacapo benchmarks. Please see attached results.
> Those are subset of DaCapo that shows any promotion failed pause.
> You can see that some speed ups and no performance regressions.


Thanks for doing some performance testing of this!

Bengt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20141027/1234077c/attachment-0001.html>


More information about the hotspot-gc-dev mailing list