[PATCH] Exploit Empty Regions in Young Gen to Enhance PS Full GC Performance
leihouyju at gmail.com
Mon Sep 16 14:02:40 UTC 2019
FYI, the evaluation results on OpenJDK 14 are plotted in the attachment. I
compute the full GC throughput by dividing the heap size before full GC by
the GC pause time, and the results are arithmetic mean values of ten runs
after a warm-up run. The evaluation is conducted on a machine with dual
Intel ®XeonTM E5-2618L v3 CPUs (2 sockets, 16 physical cores with SMT
enabled) and 64G DRAM.
Institute of Parallel and Distributed Systems(IPADS),
School of Software,
Shanghai Jiao Tong University
Stefan Johansson <stefan.johansson at oracle.com> 于2019年9月12日周四 上午5:34写道：
> Hi Haoyu,
> I recently came across your patch and I would like to pick up on some of
> the things Kim mentioned in his mails. I especially want evaluate
> and investigate if this is a technique we can use to improve the other GCs
> as well. To start that work I want to take the patch for a spin in our
> internal performance testing. The patch doesn’t apply clean to the latest
> JDK repository, so if you could provide an updated patch that would be very
> It would also be great if you could share some more information around the
> results presented in the paper. For example, it would be good to get the
> full command lines for the different benchmarks so we can run them locally
> and reproduce the results you’ve seen.
> 12 mars 2019 kl. 03:21 skrev Haoyu Li <leihouyju at gmail.com>:
> Hi Kim,
> Thanks for reviewing and testing the patch. If there are any failures or
> performance degradation relevant to the work, please let me know and I'll
> be very happy to keep improving it. Also, any suggestions about code
> improvements are well appreciated.
> I'm not quite sure if both G1 and Shenandoah have the similar region
> dependency issue, since I haven't studied their GC behaviors before. If
> they have, I'm also willing to propose a more general optimization.
> As to the memory overhead, I believe it will be low because this patch
> exploits empty regions in the young space rather than off-heap memory to
> allocate shadow regions, and also reuses the *_source_region* field of
> each *RegionData *to record the correspongding shadow region index. We
> only introduce a new integer filed *_shadow *in the RegionData class to
> indicate the status of a region, a global *GrowableArray _free_shadow* to
> store the indices of shadow regions, and a global *Monitor* to protect
> the array. These information might help if the memory overhead need to be
> Looking forward to your insight.
> Best Regrads,
> Haoyu Li,
> Institute of Parallel and Distributed Systems(IPADS),
> School of Software,
> Shanghai Jiao Tong University
> Kim Barrett <kim.barrett at oracle.com> 于2019年3月12日周二 上午6:11写道：
>> > On Mar 11, 2019, at 1:45 AM, Kim Barrett <kim.barrett at oracle.com>
>> >> On Jan 24, 2019, at 3:58 AM, Haoyu Li <leihouyju at gmail.com> wrote:
>> >> Hi Kim,
>> >> I have ported my patch to OpenJDK 13 according to your instructions in
>> your last mail, and the patch is attached in this mail. The patch does not
>> change much since PSGC is indeed pretty stable.
>> >> Also, I evaluate the correctness and performance of PS full GC with
>> benchmarks from DaCapo, SPECjvm2008, and JOlden suits on a machine with
>> dual Intel Xeon E5-2618L v3 CPUs(16 physical cores), 64G DRAM and linux
>> kernel 4.17. The evaluation result, indicating 1.9X GC throughput
>> improvement on average, is attached, too.
>> >> However, I have no idea how to further test this patch for both
>> correctness and performance. Can I please get any guidance from you or some
>> > Sorry I missed that you had sent an updated version of the patch.
>> > I’ve run the full regression suite across Oracle-supported platforms.
>> There are some
>> > failures, but there are almost always some failures in the later tiers
>> right now. I’ll start
>> > looking at them tomorrow to figure out whether any of them are relevant.
>> > I’m also planning to run some of our performance benchmarks.
>> > I’ve lightly skimmed the proposed changes. There might be some code
>> > to be made.
>> > I’m also wondering if this technique applies to other collectors. It
>> seems like both G1 and
>> > Shenandoah full gc’s might have similar issues? If so, a solution that
>> is ParallelGC-specific
>> > is less interesting than one that has broader applicability. Though
>> maybe this optimization
>> > is less important for G1 and Shenandoah, since they actively try to
>> avoid full gc’s.
>> > I’m also not clear on how much additional memory might be temporarily
>> allocated by this
>> > mechanism.
>> I’ve created a CR for this:
More information about the hotspot-gc-dev