Very slow promotion failures in ParNew / ParallelGC

Jon Masamitsu jon.masamitsu at
Mon Jan 11 21:10:49 UTC 2016


We'd be interested in the fix for 1).   I'll have to go look at more code
before having a definite  opinion on 2) but the way you describe it
makes it sound like something worth doing.   Similarly with 3).


On 01/11/2016 09:59 AM, Tony Printezis wrote:
> Hi all,
> We have been recently investigating some very lengthy (several 
> minutes) promotion failures in ParNew, which also appear in 
> ParallelGC. We have identified a few issues and have some fixes to 
> address them. Here's a quick summary:
> 1) There's a scalability bottleneck when adding marks to the preserved 
> mark stack as there is only one stack, shared by all workers, and 
> pushes to it are protected by a mutex. This essentially serializes all 
> workers if there is a non-trivial amount of marks to be preserved. The 
> fix is similar to what's been implemented in G1 in JDK 9, which is to 
> introduce per-worker preserved mark stacks.
> 2) (More interestingly) I was perplexed by the huge number of marks 
> that I see getting preserved during promotion failure. I did a small 
> study with a test I can reproduce the issue with. The majority of the 
> preserved marks were 0x5 (i.e. "anonymously biased"). According to the 
> current logic, no mark is preserved if it's biased, presumably because 
> it's assumed that the object is biased towards a specific thread and 
> we want to preserve that mark as it contains the thread pointer. The 
> fix is to use a different default mark value when biased locking is 
> enabled (0x5) or disabled (0x1, as it is now). During promotion 
> failures, marks are not preserved if they are equal to the default 
> value and the mark of forwarded objects is set to the default value 
> post promotion failure and before the preserved marks are re-instated.
> A few extra observations on this:
> - I don't know if the majority of objects we'll come across during 
> promotion failures will be anonymously biased (it is the case for 
> synthetic benchmarks). So, the above might pay off in certain cases 
> but not all. But I think it's still worth doing.
> - Even though the per-worker preserved mark stacks eliminate the big 
> scalability bottleneck, reducing (potentially dramatically) the number 
> of marks that are preserved helps in a couple of ways: a) avoids 
> allocating a lot of memory for the preserved mark stacks (which can 
> get very, very large in some cases) and b) avoids having to scan / 
> reclaim the preserved mark stacks post promotion failure, which 
> reduces the overall GC time further. Even the parallel time in ParNew 
> improves by a bit because there are a lot fewer stack pushes and 
> malloc calls.
> 3) In the case where lots of marks need to be preserved, we found that 
> using 64K stack segments, instead of 4K segments, speeds up the 
> preserved mark stack reclamation by a non-trivial amount (it's 3x/4x 
> faster).
> We have fixes for all three issues above for ParNew. We're also going 
> to implement them for ParallelGC. For JDK 9, 1) is already 
> implemented, but 2) or 3) might also be worth doing.
> Is there interest in these changes?
> Tony
> -----
> Tony Printezis | JVM/GC Engineer / VM Team | Twitter
> @TonyPrintezis
> tprintezis at <mailto:tprintezis at>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the hotspot-gc-dev mailing list