[8u, 9] RFR(S): 8177095: Range check dependent CastII/ConvI2L is prematurely eliminated
vladimir.kozlov at oracle.com
Thu Mar 23 16:10:35 UTC 2017
On 3/23/17 4:49 AM, Tobias Hartmann wrote:
> Hi Vladimir,
> thanks for looking at this!
> On 22.03.2017 18:34, Vladimir Kozlov wrote:
>> Changes seems fine but I am concern about performance impact.
> I'll do a performance run to check if there is a regression.
>> What happens with graph with this change? Is it folded at the end even if we keep CastII? Skipping CastII in ConvI2L was one of optimizations we did to remove unneeded Cast nodes for 6675699 changes.
> 6675699 did not change ConvI2LNode::Ideal() to remove unneeded CastII nodes. The code was added to be able to apply the ConvI2L(AddI(x, y)) optimization even if a range check dependent CastII is present. CastII nodes are not removed but only "pushed through".
> Range check dependent CastII nodes are always removed after loop optimizations (see Compile::remove_range_check_casts()). If the ConvI2L(AddI(x, y)) shape is still there, i.e. if the AddI was not split through the loop induction Phi, above optimization will be performed and the index computation will be folded into an addressing mode.
> The optimization in CastIINode::Ideal() was added by JDK-8145322 and partially removed with JDK-8165661  due to performance problems. Restricting the optimization to non range check dependent CastIIs should not affect performance in the cases targeted by JDK-8145322.
> I did some manual inspection of the graph/assembly and it looked good. Let's wait for the performance results.
>> What do you mean "old loop"? May be "Main loop" from split iterations optimization?
> Yes, I meant the "main loop" which we call "old loop" in PhaseIdealLoop::clone_loop():
> // We need a Region to merge the exit from the peeled body and the
> // exit from the old loop body.
You need to look on code in PhaseIdealLoop::insert_pre_post_loops() for
split iterations names.
We use next terminology.
"old loop" is original loop before iterations split.
After split we have:
"pre-loop", "main-loop", "post-loop". And Michael B. added an other
vector-post-loop for vectorization in jdk 9.
> Best regards,
>  http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/c8b709902e0e
>  http://hg.openjdk.java.net/jdk9/hs/hotspot/rev/560423724f98
>> On 3/22/17 5:31 AM, Tobias Hartmann wrote:
>>> please review the following patch:
>>> JDK 9:
>>> JDK 8u:
>>> A CastII (or a ConvI2L before the fix for JDK-6675699 ) with a narrow type that represents the index of an array access is eliminated because its type range does not intersect with the input range. This happens because the node is pushed upwards through an AddI. However, the control path to the array access is not eliminated leaving the graph in a corrupted state. We either crash during loop optimizations because we use a value from a non-dominating region or we crash in the register allocator because a Phi has a TOP input (and therefore an undefined live range).
>>> For more details see changes in TestLoopPeeling.java:
>>> Load1 and the corresponding range check are moved out of the loop and both are used after the old loop and the peeled iteration exit. For the peeled iteration, storeIndex is always Integer.MIN_VALUE and for the old loop it is 0. Hence, the merging phi has type int:<=0. Load1 reads the array at index ConvI2L(CastII(AddI(storeIndex, -1))) where the CastII is range check dependent and has type int:>=0. The CastII gets pushed through the AddI and its type is changed to int:>=1 which does not overlap with the input type of storeIndex (int:<=0). The CastII is replaced by TOP causing a cascade of other eliminations. Since the control path through the range check CmpU(AddI(storeIndex, -1)) is not eliminated, the graph is in a corrupted state. We fail once we merge with the result of Load2 because we get data from a non-dominating region.
>>> I attempted to fix this issue with JDK-6675699 but missed these cases. The problem is similar to JDK-8154831  and affects the latest JDK 6, 7, 8 and 9.
>>> I disabled narrowing of range check dependent CastIIs (either through the CastII(AddI) optimization or through CastIINode::Ideal).
>>> Tested with customer provided reproducer, regression test and RBT (running).
>>>  https://bugs.openjdk.java.net/browse/JDK-6675699
>>>  https://bugs.openjdk.java.net/browse/JDK-8154831
More information about the hotspot-compiler-dev