RFR(XS): JDK-8212122: Allow ReferenceProcessor to always be MT processing
thomas.schatzl at oracle.com
Tue Oct 16 11:45:56 UTC 2018
On Tue, 2018-10-16 at 12:15 +0200, Roman Kennke wrote:
> The more I think about it, the more messy this occurs to me.
> Again, to begin with, I believe we already have what we want: GCs can
> tell the RP if it wants mt_processing or not via the mt_processing
> It used to be the case that RP always respected that. GCs set this to
> false if num-gc-treads==1 or if it didn't want mt_processing for
> other reasons (serial GC), otherwise to true, and RP would always use
> the executor if mt_processing==true, otherwise go the single-threaded
> With the change of using ergonomically determined num-workers, this
> has changed to dynamically use one or the other path, which is not
> good for GCs that actually always want mt_processing. I find this
> surprising behaviour. I would argue that when GC tells RP to to
I agree with that this may be surprising behavior. However you do not
want all-MT for performance reasons - spinning up just one worker
thread is significantly slower than just processing the few References
inline. This particularly matters if you want very low-pause times.
(This has been improved a bit with just not doing any work if the
References are zero, which is a common case).
So you somehow need to alternate between in-thread processing and real
MT processing for maximum throughput, particularly in this case where
you spin up and shut down threads a lot potentially in rapid
Now this can be fixed _partially_ with letting the executor deciding
whether doing the work inline or not, with the design issue mentioned
by Kim already ("isn't this the point of run_task()?"). The other issue
is balancing the queues which is superfluous with only one thread - but
we've had some very subtle issues with that.
(Yeah, yeah, I do have a mostly done JDK-8202328 around, but....)
> mt_processing, then then yes, please always go the executor/mt-
> processing route. The executor is free to decide to not spin up
> workers for 1-threaded-execution as optimization. Other than that, I
Been there, done that, i.e. the programmer had been responsible for
deciding whether to do the work inline or not. This had several issues
(history lesson follows, just skip if not interested):
- people forgot to do that in some places, leading to massive
performance issues in situations the author hadn't thought of yet (like
running on a multi-TB heap with thousands of HW threads).
- people started optimizing for one or the other case, adding lots of
code and so increasing maintenance costs a lot (e.g. some paths were
basically never tested until the next crash, lots of duplicate and
boilerplate code added etc) which made for lots of pain.
- WorkGang itself does not give any kind of guarantees on which threads
a particular AbstractGangTask is run in the first place, and users must
not rely on that. Relying on that again and again just caused once-
during-a-full-moon issues in the past; another example for a misguided
assumption is that there is no guaranteed 1:1 relation between worker
id passed to the AbstractGangTask and the thread's id. I.e. a single
thread may execute an AbstractGangTask multiple times (with different
worker_id of course), and some workgang threads may not execute any at
all. (I assume that Shenandoah handles that case just fine?)
Assuming otherwise is just a wrong assumption of what WorkGang provides
to you and any code assuming so should simply be considered broken.
These are the reasons why at some point (early JDK9 timeframe I think)
we just removed the special casing and cleaned G1 AbstractGangTasks
because we got weary of spending so much time on fixing these issues.
Basically CMS' AbstractGangTasks are left to look over (which seems
obviously broken for the RP case; but CMS'es removal keeps getting
> believe that 1-threaded and N-threaded execution should be
> For this reason, I still think that simply removing this surprising
> adjustment of _mt_processing is the way to go, as proposed here:
> If we really need to preserve the old (actually, rather new)
This behavior is fully in-line with the guarantees of WorkGang
(assuming that ideally you would pass a WorkGang and AbstractGangTask
to reference processing if for a moment we ignore the Executor
framework which is an additional wrapper for Parallel GC).
> behaviour to execute single-threaded using the specific single-
> threaded setup, we can do as proposed earlier:
> and GC can override this by passing a suitable RP subclass.
> As much as I'd like to fix this mess, and care about CMS and
> ParallelGC, I currently don't have time for that (maybe later). What
> I need is a way to avoid the single-threaded-path altogether, because
> it breaks Shenandoah.
>From what I have understood so far, Shenandoah code seems to have
jumped into the same trap, i.e. seems to assume things about WorkGang
that it should not (which is unfortunately probably not a quick fix
If there is really need for getting this fixed asap, please preserve
The new behavior you propose may cause significant regressions (even if
the Executor special-cases the 1-thread case) in reference processing
as it seems to unconditionally enable queue balancing in the MT case -
which is rather slow (and useless to actually attempt when there is
only one thread to do reference processing).
More information about the hotspot-gc-dev