RFR: 8221542: ~15% performance degradation due to less optimized inline decision
fujie at loongson.cn
Sat Mar 30 10:05:34 UTC 2019
I appreciate your suggestion.
I agree with you that to solve these kinds of problems completely, the
compile policy needs to be carefully co-designed with the inline
But it is really a big topic which deserves years of research.
I have updated the patch based on your advice.
- Running scimark.monte_carlo on jdk/x64 and jdk8u/mips64 with
-XX:-TieredCompilation: no performance drop
- Running SPECjvm2008 on jdk8u/mips64 with -XX:-TieredCompilation: no
- Running make test TEST="micro" on jdk/x64: no performance regression
- Running make test TEST="tier1 tier2 tier3" JTREG="JOBS=3"
CONF=release on jdk/x64: no regression
Could you please review it and give me some advice?
Thanks a lot.
On 2019/3/28 下午2:21, Vladimir Ivanov wrote:
> Hi Jie,
> The heuristic quirk looks very similar to the one Sergey reported
> Overall, tweaking the heuristic to favor inlining doesn't look the
> right thing here. profile.count=0 is a sign the profile isn't mature
> enough and it's likely the callee doesn't have enough profiling info
> as well. (And that's what Sergey observed on some of the
> microbenchmarks during his experiments.)
> In your particular case (Random::<init>), tweaking the heuristic so
> is_init_with_ea  overrules "profile.count > 0" may be a more
> promising approach. After all, the fact that the call site is being
> considered for inlining (and not pruned along with the basic block it
> belongs to) is a strong signal in favor of "profile.count > 0" case.
> (Though it's not guaranteed due to the immaturity of profile data.)
> But IMO the root problem is that top-tier compilation happens too
> early: profile data isn't mature enough yet and it will easily lead to
> similar problems later (during compilation).
> Best regards,
> Vladimir Ivanov
> On 27/03/2019 03:15, Jie Fu wrote:
>> Hi all,
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8221542
>> ## Symptom
>> ~15% performance degradation (from 700 ops/m to 600 ops/m) was
>> observed randomly on x86 while running SPECjvm2008's
>> scimark.monte_carlo with -XX:-TieredCompilation.
>> ## Reproduce
>> It can be always reproduced with the script in less than 5 minutes.
>> ## Reason
>> The drop was caused by a not-inline decision on
>> spec.benchmarks.scimark.utils.Random::<init> in
>> ## Fix
>> It might be better to make a little change to the inline heuristic.
>> For callers without loops, the original heuristic works fine.
>> But for callers with loops, it would be better to make a not-inline
>> decision more conservatively.
>> ## Testing
>> - Running scimark.monte_carlo on jdk/x64 with -XX:-TieredCompilation
>> for about 5000 times, no performance drop
>> Also on jdk8u/mips64 with -XX:-TieredCompilation, no performance drop
>> - Running make test TEST="micro" on jdk/x64, no performance regression
>> - Running SPECjvm2008 on jdk8u/x64 with -XX:-TieredCompilation, no
>> performance regression
>> For more detailed info, please see the JBS.
>> Could you please review it?
>> Thanks a lot.
>> Best regards,
>>  http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/reproduce.sh
More information about the hotspot-compiler-dev