RFR: 8250597: G1: Improve inlining around trim_queue
thomas.schatzl at oracle.com
Fri Aug 7 07:57:17 UTC 2020
On 07.08.20 04:01, Kim Barrett wrote:
>> On Aug 6, 2020, at 9:44 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>> Hi Kim,
>> On 05.08.20 02:26, Kim Barrett wrote:
>>> Please review this change to G1ParScanThreadState to improve inlining
>>> management. This change doesn't seem to make any measurable performance
>>> difference by itself. However, without this, other changes being developed
>>> may change gcc's inlining decisions in ways that often hurt performance,
>>> mostly by heuristically cutting off inlining.
>>> The primary inlining boundary is changed to trim_queue_to_threshold, which
>>> is the main driver loop for task processing. This is instead of having the
>>> boundary at copy_to_survivor_space, which is called per-task.
>>> mach5 tier1-5
>>> various performance tests, finding no significant changes.
>> looks good, some suggestions:
>> - G1ParScanThreadState::do_copy_to_survivor_space: the first assert is
>> indented by an extra space.
> I thought I'd fixed that. Oh, I did fix it, in a later patch in my stack.
> Fixed locally; the merge conflict with the later patch wasn't too bad.
>> - maybe the two asserts in G1ParScanThreadState::trim_queue_partially() / G1ParScanThreadState::trim_queue could be moved to trim_queue_to_threshold(), checking that the overflow queue is empty, and the size is <= the passed threshold.
> The assertions seem superfluous in trim_queue_to_threshold, as being
> manifestly obvious. For example, I wouldn't put an assert that the overflow
> queue is empty immediately after the loop test that is checking whether it's
> not empty.
> I added those asserts because I was eliminating the loops on calls to
> trim_queue_to_threshold, instead making that function responsible for the
> additional outer loop. To me they seem somewhat useful where they are as
> "documentation", with the alternative being removing them entirely as being
I had the same thoughts, particularly about their usefulness and so
suggested this alternative. Fine with me keeping them as they are if you
think it makes the code clearer.
>(I think the previous behavior of trim_queue_to_threshold, with
> the callers needing to loop on it, was confusingly inconsistent with
> its name.)
The current code is nicer, yes. Thanks for refactoring this a bit. Also
the changes in the allocation path where the old-gen-is-full check has
been moved (I did not check performance with this part of the change).
More information about the hotspot-gc-dev