RFR: 8250597: G1: Improve inlining around trim_queue

Kim Barrett kim.barrett at oracle.com
Fri Aug 7 02:01:58 UTC 2020

> On Aug 6, 2020, at 9:44 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> Hi Kim,
> On 05.08.20 02:26, Kim Barrett wrote:
>> Please review this change to G1ParScanThreadState to improve inlining
>> management.  This change doesn't seem to make any measurable performance
>> difference by itself.  However, without this, other changes being developed
>> may change gcc's inlining decisions in ways that often hurt performance,
>> mostly by heuristically cutting off inlining.
>> The primary inlining boundary is changed to trim_queue_to_threshold, which
>> is the main driver loop for task processing. This is instead of having the
>> boundary at copy_to_survivor_space, which is called per-task.
>> [...]
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8250597
>> Webrev:
>> https://cr.openjdk.java.net/~kbarrett/8250597/open.00/
>> https://cr.openjdk.java.net/~kbarrett/8250597/open.01/
>> https://cr.openjdk.java.net/~kbarrett/8250597/open.01.inc/
>> Testing:
>> mach5 tier1-5
>> various performance tests, finding no significant changes.
>  looks good, some suggestions:


> - G1ParScanThreadState::do_copy_to_survivor_space: the first assert is indented by an extra space.

I thought I'd fixed that.  Oh, I did fix it, in a later patch in my stack.
Fixed locally; the merge conflict with the later patch wasn't too bad.

> - maybe the two asserts in G1ParScanThreadState::trim_queue_partially() / G1ParScanThreadState::trim_queue could be moved to trim_queue_to_threshold(), checking that the overflow queue is empty, and the size is <= the passed threshold.

The assertions seem superfluous in trim_queue_to_threshold, as being
manifestly obvious.  For example, I wouldn't put an assert that the overflow
queue is empty immediately after the loop test that is checking whether it's
not empty.

I added those asserts because I was eliminating the loops on calls to
trim_queue_to_threshold, instead making that function responsible for the
additional outer loop. To me they seem somewhat useful where they are as
"documentation", with the alternative being removing them entirely as being
superfluous. (I think the previous behavior of trim_queue_to_threshold, with
the callers needing to loop on it, was confusingly inconsistent with its name.)

More information about the hotspot-gc-dev mailing list