RFR: 8250597: G1: Improve inlining around trim_queue

Thomas Schatzl thomas.schatzl at oracle.com
Fri Aug 7 07:57:17 UTC 2020


On 07.08.20 04:01, Kim Barrett wrote:
>> On Aug 6, 2020, at 9:44 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>> Hi Kim,
>> On 05.08.20 02:26, Kim Barrett wrote:
>>> Please review this change to G1ParScanThreadState to improve inlining
>>> management.  This change doesn't seem to make any measurable performance
>>> difference by itself.  However, without this, other changes being developed
>>> may change gcc's inlining decisions in ways that often hurt performance,
>>> mostly by heuristically cutting off inlining.
>>> The primary inlining boundary is changed to trim_queue_to_threshold, which
>>> is the main driver loop for task processing. This is instead of having the
>>> boundary at copy_to_survivor_space, which is called per-task.
>>> [...]
>>> CR:
>>> https://bugs.openjdk.java.net/browse/JDK-8250597
>>> Webrev:
>>> https://cr.openjdk.java.net/~kbarrett/8250597/open.00/
>>> https://cr.openjdk.java.net/~kbarrett/8250597/open.01/
>>> https://cr.openjdk.java.net/~kbarrett/8250597/open.01.inc/
>>> Testing:
>>> mach5 tier1-5
>>> various performance tests, finding no significant changes.
>>   looks good, some suggestions:
> Thanks.
>> - G1ParScanThreadState::do_copy_to_survivor_space: the first assert is
 >> indented by an extra space.
> I thought I'd fixed that.  Oh, I did fix it, in a later patch in my stack.
> Fixed locally; the merge conflict with the later patch wasn't too bad.
>> - maybe the two asserts in G1ParScanThreadState::trim_queue_partially() / G1ParScanThreadState::trim_queue could be moved to trim_queue_to_threshold(), checking that the overflow queue is empty, and the size is <= the passed threshold.
> The assertions seem superfluous in trim_queue_to_threshold, as being
> manifestly obvious.  For example, I wouldn't put an assert that the overflow
> queue is empty immediately after the loop test that is checking whether it's
> not empty.
> I added those asserts because I was eliminating the loops on calls to
> trim_queue_to_threshold, instead making that function responsible for the
> additional outer loop. To me they seem somewhat useful where they are as
> "documentation", with the alternative being removing them entirely as being
> superfluous. 

I had the same thoughts, particularly about their usefulness and so 
suggested this alternative. Fine with me keeping them as they are if you 
think it makes the code clearer.

 >(I think the previous behavior of trim_queue_to_threshold, with
 > the callers needing to loop on it, was confusingly inconsistent with
 > its name.)

The current code is nicer, yes. Thanks for refactoring this a bit. Also 
the changes in the allocation path where the old-gen-is-full check has 
been moved (I did not check performance with this part of the change).

Looks good.


More information about the hotspot-gc-dev mailing list