RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
vladimir.x.ivanov at oracle.com
Fri Jan 16 17:16:22 UTC 2015
After GuardWithTest (GWT) LambdaForms became shared, profile pollution
significantly distorted compilation decisions. It affected inlining and
hindered some optimizations. It causes significant performance
regressions for Nashorn (on Octane benchmarks).
Inlining was fixed by 8059877 , but it didn't cover the case when a
branch is never taken. It can cause missed optimization opportunity, and
not just increase in code size. For example, non-pruned branch can break
Currently, there are 2 problems:
- branch frequencies profile pollution
- deoptimization counts pollution
Branch frequency pollution hides from JIT the fact that a branch is
never taken. Since GWT LambdaForms (and hence their bytecode) are
heavily shared, but the behavior is specific to MethodHandle, there's no
way for JIT to understand how particular GWT instance behaves.
The solution I propose is to do profiling in Java code and feed it to
JIT. Every GWT MethodHandle holds an auxiliary array (int) where
profiling info is stored. Once JIT kicks in, it can retrieve these
counts, if corresponding MethodHandle is a compile-time constant (and it
is usually the case). To communicate the profile data from Java code to
JIT, MethodHandleImpl::profileBranch() is used.
If GWT MethodHandle isn't a compile-time constant, profiling should
proceed. It happens when corresponding LambdaForm is already shared, for
newly created GWT MethodHandles profiling can occur only in native code
(dedicated nmethod for a single LambdaForm). So, when compilation of the
whole MethodHandle chain is triggered, the profile should be already
Overriding branch frequencies is not enough. Statistics on
deoptimization events is also polluted. Even if a branch is never taken,
JIT doesn't issue an uncommon trap there unless corresponding bytecode
doesn't trap too much and doesn't cause too many recompiles.
I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT
sees it on some method, Compile::too_many_traps &
Compile::too_many_recompiles for that method always return false. It
allows JIT to prune the branch based on custom profile and recompile the
method, if the branch is visited.
For now, I wanted to keep the fix very focused. The next thing I plan to
do is to experiment with ignoring deoptimization counts for other
LambdaForms which are heavily shared. I already saw problems caused by
deoptimization counts pollution (see JDK-8068915 ).
I plan to backport the fix into 8u40, once I finish extensive
Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, Octane).
PS: as a summary, my experiments show that fixes for 8063137 & 8068915
 almost completely recovers peak performance after LambdaForm sharing
. There's one more problem left (non-inlined MethodHandle invocations
are more expensive when LFs are shared), but it's a story for another day.
8059877: GWT branch frequencies pollution due to LF sharing
JEP 210: LambdaForm Reduction and Caching
More information about the hotspot-compiler-dev