Fwd: Tiered compilation and virtual call heuristics
vladimir.kozlov at oracle.com
Thu Jul 30 02:08:56 UTC 2015
The main issue here is that without Tiered Interpreter starts collection
profiling information only after 3300 invocations
(InterpreterProfilePercentage). As result data from first invocations is
On other hand with Tiered C1 compilation (with profiling code) is
triggered after 100 invocations. So you have a lot more data as you
If you can sacrifice a startup performance you can try to use
CompileThresholdScaling to increase compilation thresholds to delay
Or you can also try to increase Tier3InvocationThreshold and
Tier3CompileThreshold to delay only C1 compilation:
Here is formula from simpleThresholdPolicy.inline.hpp:
return (i >= Tier3InvocationThreshold * scale) ||
(i >= Tier3MinInvocationThreshold * scale && i + b >=
Tier3CompileThreshold * scale);
But if you have real "flat" profile (all called methods are relatively
warm) nothing will help you.
If you have some methods which are relatively hot you can solve that by
trying to call them at the beginning. For example, if you had
count400(0) called first (or second) you will get record for it in MDO.
And then you can try to low TypeProfileMajorReceiverPercent to avoid
virtual call at least for on hot method (recorded in MDO):
product(intx, TypeProfileMajorReceiverPercent, 90,
"% of major receiver type to all profiled receivers")
On 7/22/15 10:37 AM, Carsten Varming wrote:
> Dear Hotspot compiler group,
> I have had a few issues with tiered compilation in JDK8 lately and was
> wondering if you have some comments or ideas for the given problem.
> Here is my problem as I currently understand it. Feel free to correct
> any misunderstandings I may have. With tiered compilation the heuristics
> for inlining virtual calls seems to degrade quite a bit. I think this is
> due to MethodData objects being created much earlier with tiered than
> without. This causes the tracking of the hottest target methods at a
> virtual call site to go awry, due to the limit (2) on the number of
> MethodData objects that can be associated with a bci in a method. It
> seems like the only virtual call targets tracked are the targets that
> are warm when when C1 is invoked.
> The program ends up with all call-sites in
> scala.collection.IndexedSeqOptimized.slice using virtual dispatch with
> tiered and bimorphic call sites without tiered. The end result with
> tiered is a tripling of the cpu required to run the program, and
> instruction pointers from the compiled slice method end up in 90% of all
> cpu samples (collected with perf at 4kHz).
> The problem is with a small application built in Scala on top of Netty.
> I have written a small sample program (see attached Main.java) to spare
> you the details (and to be able to give you code).
> When I run the sample program with tiered then the call to count end up
> being a virtual call, due to Instance$3.count and Instance4.count being
> warm when C1 kicks in. Without tiered Instance$1.count is the only hot
> I wonder if you guys have seen this problem in the wild or if I just
> happen to be unlucky. Increasing BciProfileWidth should help in my case,
> but it is not a product flag. Do you have any experience regarding cost
> of increasing BciProfileWidth? Do you have any thoughts on throwing out
> MethodData objects for virtual call sites that turns out to be pretty cold?
> Thank you in advance for your thoughts,
More information about the hotspot-compiler-dev