[9] RFR(S): 8159016: Over-unrolled loop is partially removed

Vladimir Kozlov vladimir.kozlov at oracle.com
Tue Jun 21 19:59:50 UTC 2016

Thank you, Tobias

Your changes are good to have.

You can still have a problem if unrolling is done before CCP phase which calculates array size and/or limit.

Try to delay size calculation in test until CCP phase. Something like next (please, verify):

         // Simplified during CCP:
         int size = 2;
         for (; size < 4; size *= 2);
         Object arr[] = new Object[size - 1];

Why over-unrolling check code in IdealLoopTree::policy_unroll() does not work?

I think the trip count check before main loop should collapse whole main loop code. Then we will not have this problem. 
Why it is not collapsed?

About test.

You do & 3 two times:

int lim = (arg & 3);
test(i & 3);

Use 11_000 iterations because 10000 may not trigger compilation.

Let array escape to avoid EA. For example, return it.


On 6/21/16 8:19 AM, Tobias Hartmann wrote:
> Hi,
> please review the following fix:
> https://bugs.openjdk.java.net/browse/JDK-8159016
> http://cr.openjdk.java.net/~thartmann/8159016/webrev.00/
> I was able to reproduce the problem with a simple test:
>  Object arr[] = new Object[3];
>  int lim = (arg & 3);
>  for (int i = 0; i < lim; ++i) {
>    arr[i] = new Object();
>  }
> The loop is split into pre, main and post loops. The pre loop is executed for at least one iteration, initializing arr[0]. The main loop is unrolled twice, initializing at least arr[1], arr[2], arr[3] and arr[4]. The arr[3] and arr[4] stores are always out of bounds and removed. However, C2 is unable to remove the "over-unrolled", dead main loop. As a result, there is a control path from the main loop to the PostScalarRce loop (see JDK-8151573) without a memory path since the last store was replaced by TOP. We crash because we use a memory edge from a non-dominating region.
> The out of bound stores in the over-unrolled loop are removed because their range check dependent CastII nodes are removed by the first optimization in CastIINode::Ideal():
>  CastII (AddI Phi 2) -> AddI (CastII Phi) 2 -> AddI TOP 2
> The CastIINodes are replaced by TOP because the type of the loop Phi is >= 1 and the CastII is [1..2], i.e. there is no value >= 1 that is in the [1..2] range if +2 is added.
> The underlying problem is the over-unrolling of the loop. Since lim < 3, we always only execute at least 3 loop iterations. With the pre loop executing at least one iteration, the main loop should not be unrolled more than once. This problem is similar to JDK-4834191 where we over-unrolled a loop with a constant limit.
> I changed the implementation of IdealLoopTree::compute_exact_trip_count() to not only compute the exact trip count if the loop limit is constant but to also set the _trip_count field if we are able to determine an upper bound. Like this, the checks in IdealLoopTree::policy_unroll() prevent additional loop unrolling if the trip_count became too small and we keep the maximally unrolled version.
> An alternative fix would be to disable the CastII optimization for range check dependent CastII nodes but this does not fix the underlying problem of over-unrolling.
> Tested with regression test, JPRT and RBT (running).
> Thanks,
> Tobias

More information about the hotspot-compiler-dev mailing list