RFR: 8133051: Concurrent refinement threads may be activated and deactivated at random
jon.masamitsu at oracle.com
Tue Apr 5 22:31:37 UTC 2016
I saw that this review is on hold. Just want to offer a suggestion (if
already there - I only had a cursory look). In run_service() count the
number of buffers processed and report that number in the deactivate
On 4/4/2016 11:48 AM, Kim Barrett wrote:
> Please review this change to the G1 concurrent refinement thread
> controller. This change addresses unnecessary activation when there
> are many threads and few buffers to be processed. It also addresses
> delayed activation due to mis-configuration of the dirty card queue
> set's notification mechanism.
> This change continues to use (more or less) the existing control
> model, only avoiding obviously wasted effort or undesirable delays.
> Further enhancements to the control model will be made under
> JDK-8137022 or subtasks from that.
> - Changed the G1 concurrent refinement thread activation controller to
> use a minimum buffer count step between (de)activation values for the
> threads. This is accomplished by having a minimum yellow zone size,
> based on the number of refinement threads. This avoids waking up more
> refinement threads than there are buffers available to process. (It
> is, of course, still possible for a refinement thread to wake up and
> discover it has no work to do, because of progress by other threads.
> But at least we're no longer waking up threads with a near guarantee
> they won't find work to do.)
> - As part of the above, changed G1ConcRefinementThresholdStep to have
> a minimum value of one, a default value of 2, and to be used to
> determine a lower bound on the thread activation step size. A larger
> step size makes it less likely a thread will be woken up and discover
> other threads have already completed the work "allocated" to it. Too
> large a minimum may overly restrict the number of refinement threads
> being activated, leading to missed pause targets.
> - Changed the threshold for activation of the primary concurrent
> refinement thread via notification from the dirty card queue set upon
> enqueue of a new buffer. It was previously using a notification
> threshold of green_zone * (1 + predictor_sigma), rather than the
> "normal" activation threshold calculated using the green_zone value
> and threshold steps. Using default configuration parameters, this
> could lead to a significantly larger activation threshold,
> particularly as the green_zone value grows, which could lead to a much
> larger number of pending buffers for pause-time update_rs to process,
> leading to missed update_rs time targets and unnecessary back pressure
> on the green_zone size.
> Comparing runs of specjbb2015 on Linux-x64 with 24 logical processors
> (so 18 refinement threads with the default configuration), with these
> changes we see a noticable increase in the steady state green zone
> value as compared to the baseline:
> baseline modified
> mean 387 437
> median 390 445
> stddev 68 67
> min 121 167
> max 568 575
> across ~375 collection pauses for each case.
> We're still using the same green zone adjustment (the first 40 or so
> pauses show identical green_zone growth in this comparison). The
> difference is in the activation of the primary (zero'th) concurrent
> refinement thread by dirty card queue set notification. After a pause
> we'll often see a burst of concurrent refinement thread activity, as
> dirty cards scheduled for revisiting are processed. Once that's done,
> the modified version typically activates / runs / deactivates just the
> primary thread as mutators enqueue buffers, keeping the number of
> buffers close to the green zone target. The baseline allows the
> number of buffers to grow until several threads are activated (4 with
> the default configuration used). Sometimes the baseline starts them
> too late (or not at all), allowing the number of buffers to
> significantly exceed the green zone target when a pause occurs,
> leading to the update_rs phase exceeding its time goal.
> As a result of this change, ConcurrentG1Refine construction no longer
> needs to predictor argument (though it may return with future
> improvements to the control model as part of JDK-8137022).
> - Command line -XX:G1ConcRefinementThreads=0 now creates zero
> concurrent refinement threads, rather than using the ergonomic default
> even though zero is explicitly specified. This will result in
> mutator-only concurrent processing of dirty card buffers, which may
> result in missed pause targets. (Mutator-only processing being
> insufficient is one of the issues discussed in JDK-8137022.) The use
> of a zero value is mostly intended for testing, rather than production
> - Command line -XX:G1ConcRefinementRedZone=0 is no longer documented
> as disabling concurrent processing. So far as I can tell, it never
> did so. Rather, it meant that buffers completed by mutator threads
> were always processed by them (and that only when
> G1UseAdaptiveConcRefinement was off). Buffers enqueued for other
> reasons would still be processed by the concurrent refinement threads.
> Local specjbb2015 (Linux-x64)
> GC nightly with G1
> Aurora performance testing - no significant differences.
More information about the hotspot-gc-dev