[PATCH] enhancement to ParallelScavenge Full GC

Thomas Schatzl thomas.schatzl at oracle.com
Mon Dec 14 14:48:06 UTC 2015

Hi Tianyang,

On Mon, 2015-12-14 at 15:55 +0800, ray alex wrote:
> Hi all,
> Can I have reviews for the following enhancement?

Thanks for contributing :) Here is a quick review.

Note that I am typically not working on Parallel GC, so I am not very
proficient in this code and did not take the time to look into it in
more detail yet. Also, I did not do any performance testing, so I am
mostly criticizing coding style :)

- please remove obvious debug code, like most of the code in
psParallelCompact.cpp. Also, for whatever reason the TraceTime
constructor gets passed a print_phases() method that does not exist.

- I am not sure why the VM thread needs to have the cached
live-words-in-range information. For that matter, I think that putting
new members into the Thread class is a no-go because these members are
only used by the GC threads, penalizing everyone else. I think the
correct way of putting GC task thread specific data is adding these
members to the GCTaskThread class.

Since for worker threads Thread::current() is a GCTaskThread(), you can
still use it to retrieve the values (and add a
Thread::as_gctask_thread() method like as_worker_thread() to retrieve

Also, please create appropriate getters and/or setters for the
variables, taking into account below comments.

- in the new ParMarkBitMap::live_words_in_range() method, please do not
use l-value references. Hotspot code does not use them, and it does not
seem worth to change this "do as the surrounding code does" rule.

Following comments are for ParMarkBitMap::live_words_in_range():

- I do not see a point in opening extra scopes for straight-line code
(like in parMarkBitMap.cpp:161)

- the second part of the new live_words_in_range() looks like a verbatim
copy of (the now) live_words_in_range_helper(). Please refactor the code

E.g. I would prefer if the code looked like the following (actual naming
up to debate, I think you will find better names):

live_words_in_range() {
  if (live_words_in_range_in_cache()) {
    return cached-values();
  return live_words_in_range_helper();

or something similar. That would be a lot better to read and maintain
instead of the copy&paste.

- use of uint64_t for casting addresses/comparisons is wrong and just
unnecessary type conversion on 32 bit. The type the language provides
for this purpose is (u)intptr_t.

Also, you probably want to use the pointer_delta function to calculate
pointer deltas without overflow.

> Description:
> The change addresses the inefficiency in the live_words_in_range()
> routine in ParallelScavenge Full GC,
> which generally costs most of the time during the compacting phase.
>Our idea records last query information of live_words_in_range as well
>as the query result.
> Our approach reuses last query result so that can reduce much
> computation of bitmap searching.


> Patch:  as attached.
> Testing:
> Manual testing of full GC intensive applications including JOlden,
> Dacapo, SPECjvm2008 benchmarks.
> Results show that our enhancement can reduce much full GC time by
> 20~50%, depending on application scenarios.

I am not sure if JOlden (benchmark from '95), or Dacapo (from '09) is
representative of today's workloads (or specjvm2008), but the idea
sounds timeless :)

I will run (a slightly modified version, see pointer_delta() comment
above) through some basic testing, and then some benchmarks.

> Lei Tianyang
> Institute of Parallel and Distributed System (IPADS)
> Shanghai Jiao Tong University

Another procedural comment, you do not seem to have signed an Oracle's
Contributor agreement. We need that to be able to take your patch,
unless you are covered by some companies' OCA (but then we would need
you to use an email address from that company I think).

Please see http://www.oracle.com/technetwork/community/oca-486395.html#l
for more information.



More information about the hotspot-gc-dev mailing list