RFR(XS): JDK-8212122: Allow ReferenceProcessor to always be MT processing
rkennke at redhat.com
Tue Oct 16 21:07:21 UTC 2018
>> The more I think about it, the more messy this occurs to me.
>> Again, to begin with, I believe we already have what we want: GCs can
>> tell the RP if it wants mt_processing or not via the mt_processing
>> It used to be the case that RP always respected that. GCs set this to
>> false if num-gc-treads==1 or if it didn't want mt_processing for
>> other reasons (serial GC), otherwise to true, and RP would always use
>> the executor if mt_processing==true, otherwise go the single-threaded
>> With the change of using ergonomically determined num-workers, this
>> has changed to dynamically use one or the other path, which is not
>> good for GCs that actually always want mt_processing. I find this
>> surprising behaviour. I would argue that when GC tells RP to to
> I agree with that this may be surprising behavior. However you do not
> want all-MT for performance reasons - spinning up just one worker
> thread is significantly slower than just processing the few References
> inline. This particularly matters if you want very low-pause times.
Right. That's why we 'fixed' that by special-casing this in the executor.
> (This has been improved a bit with just not doing any work if the
> References are zero, which is a common case).
> So you somehow need to alternate between in-thread processing and real
> MT processing for maximum throughput, particularly in this case where
> you spin up and shut down threads a lot potentially in rapid
> Now this can be fixed _partially_ with letting the executor deciding
> whether doing the work inline or not, with the design issue mentioned
> by Kim already ("isn't this the point of run_task()?"). The other issue
> is balancing the queues which is superfluous with only one thread - but
> we've had some very subtle issues with that.
Riiight. Very good point.
> (Yeah, yeah, I do have a mostly done JDK-8202328 around, but....)
>> mt_processing, then then yes, please always go the executor/mt-
>> processing route. The executor is free to decide to not spin up
>> workers for 1-threaded-execution as optimization. Other than that, I
> Been there, done that, i.e. the programmer had been responsible for
> deciding whether to do the work inline or not. This had several issues
> (history lesson follows, just skip if not interested):
> - people forgot to do that in some places, leading to massive
> performance issues in situations the author hadn't thought of yet (like
> running on a multi-TB heap with thousands of HW threads).
> - people started optimizing for one or the other case, adding lots of
> code and so increasing maintenance costs a lot (e.g. some paths were
> basically never tested until the next crash, lots of duplicate and
> boilerplate code added etc) which made for lots of pain.
> - WorkGang itself does not give any kind of guarantees on which threads
> a particular AbstractGangTask is run in the first place, and users must
> not rely on that. Relying on that again and again just caused once-
> during-a-full-moon issues in the past; another example for a misguided
> assumption is that there is no guaranteed 1:1 relation between worker
> id passed to the AbstractGangTask and the thread's id. I.e. a single
> thread may execute an AbstractGangTask multiple times (with different
> worker_id of course), and some workgang threads may not execute any at
> all. (I assume that Shenandoah handles that case just fine?)
> Assuming otherwise is just a wrong assumption of what WorkGang provides
> to you and any code assuming so should simply be considered broken.
> These are the reasons why at some point (early JDK9 timeframe I think)
> we just removed the special casing and cleaned G1 AbstractGangTasks
> because we got weary of spending so much time on fixing these issues.
> Basically CMS' AbstractGangTasks are left to look over (which seems
> obviously broken for the RP case; but CMS'es removal keeps getting
> closer anyway).
>> believe that 1-threaded and N-threaded execution should be
>> For this reason, I still think that simply removing this surprising
>> adjustment of _mt_processing is the way to go, as proposed here:
>> If we really need to preserve the old (actually, rather new)
> This behavior is fully in-line with the guarantees of WorkGang
> (assuming that ideally you would pass a WorkGang and AbstractGangTask
> to reference processing if for a moment we ignore the Executor
> framework which is an additional wrapper for Parallel GC).
>> behaviour to execute single-threaded using the specific single-
>> threaded setup, we can do as proposed earlier:
>> and GC can override this by passing a suitable RP subclass.
>> As much as I'd like to fix this mess, and care about CMS and
>> ParallelGC, I currently don't have time for that (maybe later). What
>> I need is a way to avoid the single-threaded-path altogether, because
>> it breaks Shenandoah.
> From what I have understood so far, Shenandoah code seems to have
> jumped into the same trap, i.e. seems to assume things about WorkGang
> that it should not (which is unfortunately probably not a quick fix
> If there is really need for getting this fixed asap, please preserve
> old behavior.
> The new behavior you propose may cause significant regressions (even if
> the Executor special-cases the 1-thread case) in reference processing
> as it seems to unconditionally enable queue balancing in the MT case -
> which is rather slow (and useless to actually attempt when there is
> only one thread to do reference processing).
Thanks for the detailed explanations! I have reconsidered our problem
and found another workaround, which does not require any RP changes and
does use the separate 1-threaded and N-threaded setups for now. In the
long run we'll probably figure out something significantly better, and
until then we're good I guess. I'm withdrawing this RFR.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 833 bytes
Desc: OpenPGP digital signature
More information about the hotspot-gc-dev