RFR(M): 6675699: need comprehensive fix for unconstrained ConvI2L with narrowed type
tobias.hartmann at oracle.com
Thu Jan 14 16:00:36 UTC 2016
please review the following patch.
The problem is that ConvI2L nodes with a narrow type (used to convert integer array indices to long values) are not dependent on the corresponding range check that proves that the input value is always in the (integer-)range. As a result, the ConvI2L node may flow above the range check during loop optimizations and end up with an input that is not in its type range. The node is then replaced by TOP causing the data path to be eliminated. However, because there is no control dependency on the corresponding range check, the control path from the peeled iteration that uses the result of the ConvI2L may not be eliminated. We crash because we are potentially using a value that is not available.
For example, TestLoopPeeling::testArrayAccess() triggers loop peeling because the loop contains an invariant check. The array store in line 66 is moved out of the loop and reachable from the peeled and old iterations of the loop. However, the array index computation consisting of a LShiftL(ConvI2L(Phi)) remains in each loop because it has loop variant usages and is not dependent on the range check that was moved out of the loop. The peeled iteration of the loop uses storeIndex == -1 causing the ConvI2L to be replaced by TOP because -1 is not in its [0, MAX_INT] range. The TOP is propagated downwards and ends up as one of the inputs to the Phi that merges the array index from the peeled and old loop exits. The Phi replaced by it's only remaining input and the store ends up using the index from the old iteration although it's still reachable from the peeled iteration. We crash because we potentially use the index value from the old iteration while coming from the peeled iterat!
ion (of co
urse, the range check would catch this at runtime).
This problem may show up with array accesses but also with other code for which we emit a ConvI2L node with a narrow type. For example, array allocation uses a ConvI2L to convert the integer array size to a long value (see TestLoopPeeling::testArrayAllocation). We solved several different instances of this problem in the past with "workaround-fixes" that just disabled loop optimizations in special cases (see below). Such a workaround fix is not feasible to fix all potential occurrences of this problem. TestLoopPeeling.java crashes JDK 7, 8 and 9.
To make the ConvI2L dependent on a range check, I added code to emit a narrow CastII node with a control dependency on the range check that is then used as input to the ConvI2L. Like this, we explicitly express the dependency and prevent loop optimizations from moving the ConvI2L above the range check.
To make sure that the impact is as small as possible, the range check dependent CastII nodes are removed right after loop optimizations. Further, all optimizations that depend on the old shape of array address computations are adapted to be aware of the CastII node.
With the fix, we could now remove the following old "workaround-fixes":
For reference, the individual patches can be found here:
However, performance evaluation showed that backing out the old fixes causes significant regressions. It seems that aggressive splitting of ConvI2L nodes through phis leads to less optimal code due to more register spilling. I suspect that additional changes to the loop optimizations are necessary and would therefore like to leave the workaround fixes in for now. I filed JDK-8145313 to remove them later. Like this, we also reduce the impact/risk when backporting this fix to JDK 8 and potentially JDK 7.
Roland pointed out that the changes in ConvI2LNode::Ideal() could potentially be merged into the CastIINode::Ideal() optimization introduced by his fix for JDK-8145322. After some investigation it turned out that the CastII optimization does not only affect memory addressing but also other CastII(AddI(..)) graph shapes. Making it more generic has a broader impact and therefore needs more investigation. I filed JDK-8147394 for this.
ConvI2L nodes with a narrow type are also emitted by intrinsics:
I was not able to reproduce the problem with intrinsics. It's also not easily possible to make the CastII node range check dependent here because the range check is not always available from within the intrinsic.
I did extensive testing to make sure the fix does not introduce correctness or performance issues.
- Different RBT test suites  with and without -Xcomp.
- Full run of multiple CTW suites.
- Verified changes in "PhaseIdealLoop::match_fill_loop" (loopTransform.cpp) by manually checking the output of  with -XX:+TraceOptimizeFill.
- Verified changes in "IfNode::improve_address_types" (ifnode.cpp) by manually checking the output of  with -XX:+PrintOptoAssembly to make sure all range checks are folded.
- Verified changes in superword.cpp by comparing output with -XX:+TraceSuperWord.
- Performance runs (Footprint, JMH-Javac, SPECjbb2005, SPECjvm2008, Startup, Volano) on x86 and SPARC showed no regression
 RBT test suites:
Only without -Xcomp:
More information about the hotspot-compiler-dev