RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode
john.r.rose at oracle.com
Tue Jan 26 22:48:22 UTC 2016
What I would like to see is for users to feel free to use CallSites
with any amount of mutability, and have the JVM pick a good
strategy for speculating and optimizing through CS target bindings.
By "good" I mean that, if the CS is not megamutable, you get
the performance comparable to an "invokestatic". But if the
CS *is* megamutable (unstable), it is not "good" (IMO) to issue
a storm of recompilations, especially if (as is usually the case)
the megamutable CS is one of 1000s of other call sites in the
same code blob, all of which must be recompiled because one
CS had a problem.
Instead, the megamutable CS should be downgraded to an
indirect call through a normal (or volatile) variable.
So does this leave some performance on the floor? Of course;
perhaps the CS finally settles down long enough for the JVM
to venture a profitable recompilation, and for the cost of recompilation
to be paid off by further stability and efficient execution of the CS.
My main point here is that reoptimization of megamutables is
a misuse of speculation. I'm not saying that the JIT should have
a tantrum and refuse to compile the call site (which is a bug),
but it should stop speculating that it is stable when in fact it is not.
There are lots of ways to improve the performance of megamutables,
but unconditional recompilation is not one of those ways. It uses
a wrecking ball to swat a fly.
Handling megamutables is very much like handling megamorphics.
You want to hang on to the hope that there are really just a few
branches (common case) and optimize those, and call out-of-line
for the rest. If that hope fails, you call out-of-line always. And
you want to detect if the statistics change, where the entropy of
the CS target goes down to a small number, so you can venture
another recompile with up-to-date speculation. We should apply
these techniques to both megamorphics and megamutables.
So there's an ambiguity in the contract: Is CS speculation just
a best-efforts kind of thing, or is the JVM contracted to mechanically
recompile on every CS change? I think the reasonable reading
of the javadoc (etc.) is the first, not the second.
How would a user communicate that his CS is a special one,
whose invalidation should *always* trigger reoptimization?
I don't know, maybe an integer-valued callback that is triggered
during setTarget calls, and returns the amount of (virtual)
time before the next reoptimization should be attempted.
The callback would be passed the number of previous
reoptimizations (at this site or in the whole method or
both), as a warning of how resource-intensive this CS
is becoming. Returning constant zero means the
current behavior. I think you can see lots of problems
with such an API.
And, I think that sort of thing isn't notably better than simple
JVM heuristics. Here's how I think we should fix the
1. Speculate at first that a CS is immutable.
2. If that fails, speculate that it is stable, as:
if (cs.t == expected) inline expected(); else outline cs.t();
Collect a profile count along the outline path.
3. Every once in a while, if a code blog is accumulating
outline counts, queue it for reoptimization.
Crucially, do this in such a way that the JIT does
not become a foreground consumer of CPU cycles.
4. When recompiling a stable call site, always
inline the current target ("this time fer sure!").
Maybe if this is a *really* bad actor (but how
can you tell?) forget the speculation part.
5. Maybe, speculate on the LF of the target,
not the target itself, to allow some degree of
harmless variation by targets. (For some
codes that will help, although it interacts
with MH customization in tricky ways.)
6. Maybe fiddle with collecting previous hot targets,
or (better) empower the JDK code to manage that stuff.
PIC logic should be handled at the JDK level,
not in the JIT.
Anyway, if the above gets addressed eventually,
or if the rest of the MLVM crew proves that I don't
know what I'm talking about, I'm OK with this fix.
"Reviewed", assuming future improvements.
On Jan 20, 2016, at 3:54 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> John, Chris, thanks for the feedback.
> I don't think it is only about microbenchmarks. Long-running large applications with lots of mutable call sites should also benefit for this change. Current JVM behavior counts invalidations on root method, so nmethods with multiple mutable call sites (from root & all inlined callees) are more likely to hit the limit, even if there's no mega-mutable sites. It just sums up and PerMethodRecompilationCutoff (= 400, by default) doesn't look like a huge number.
> Also, LambdaForm sharing somewhat worsen the situation. When LambdaForms were mostly customized, different method handle chains were compiled into a single nmethod. Right now, it means that not only the root method is always interpreted, but all bound method handle chains are broken into numerous per-LF nmethods (see JDK-8069591 for some details).
> MLVM folks, I'd like to hear your opinion about what kind of behavior do you expect from JVM w.r.t. mutable call sites.
> There are valid use-cases when JVM shouldn't throttle the recompilation (e.g., long-running application with indy-based dynamic tracing). Maybe there's a place for a new CallSite flavor to clearly communicate application expectations to the JVM? Either always recompile (thus eventually reaching peak performance) or give up and generate less efficient machine code, but save on possible recompilations.
> Best regards,
> Vladimir Ivanov
> On 1/20/16 2:37 AM, John Rose wrote:
>> On Jan 18, 2016, at 4:54 AM, Vladimir Ivanov
>> <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>> wrote:
>>> The fix is to avoid updating recompilation count when corresponding
>>> nmethod is invalidated due to a call site target change.
>> Although I'm not vetoing it (since it seems it will help customers in
>> the short term), I'm uncomfortable with this fix because it doesn't
>> scale to large dyn. lang. applications with many unstable call sites.
>> Put another way, it feels like we are duct-taping down a failsafe
>> switch (against infinite recompilation) in order to spam a
>> micro-benchmark: a small number mega-mutable call sites for which we
>> are willing to spend (potentially) all of the JIT resources, including
>> those usually allocated to application performance in the steady state.
>> Put a third way: I am not comfortable with unthrottled infinite
>> recompilation as a performance strategy.
>> I've commented on the new RFE (JDK-8147550) where to go next, including
>> the following sentiments:
>>> There is a serious design tension here, though: Some users apparently
>>> are willing to endure an infinite series of recompilations as part of
>>> the cost of doing business; JDK-7177745 addresses this need by turning
>>> off the fail-safe against (accidental, buggy) infinite recompilation
>>> for unstable CSs. Other users might find that having a percentage of
>>> machine time devoted to recompilation is a problem. (This has been the
>>> case in the past with non-dynamic languages, at least.) The code shape
>>> proposed in this bug report would cover all simple unstable call
>>> sites (bi-stable, for example, would compile to a bi-morphic call),
>>> but, in pathological cases (infinite sequence of distinct CS targets)
>>> would "settle down" into a code shape that would be sub-optimal for
>>> any single target, but (as an indirect MH call) reasonable for all the
>>> targets together.
>>> In the absence of clear direction from the user or the profile, the
>>> JVM has to choose infinite recompilation or a good-enough final
>>> compilation. The latter choice is safer. And the
>>> infinite recompilation is less safe because there is no intrinsic
>>> bound on the amount of machine cycles that could be diverted to
>>> recompilation, given a dynamic language application with
>>> enough mega-mutable CSs. Settling down to a network of indirect calls
>>> has a bounded cost.
>>> Yes, one size-fits-all tactics never please everybody. But the JVM
>>> should not choose tactics with unlimited downsides.
>> — John
More information about the hotspot-compiler-dev