RFR: 8129920 - Vectorized loop unrolling

Berg, Michael C michael.c.berg at intel.com
Tue Jun 30 00:46:21 UTC 2015

Vladimir, sure I will change to reflect we are only allowing unrolling.
For the unroll only case,  we are allowing all the standard logic for unrolling to apply without unroll queries and its cases.  We would need (cl->has_passed_slp() && !cl->unroll_only()) to make both the guarded cases equivalent. We have less code the way I have it.  I could word it differently, but it would work out about the same in new code.


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Monday, June 29, 2015 2:00 PM
To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net'
Subject: Re: RFR: 8129920 - Vectorized loop unrolling

ignore_slp() and NoMoreSlp whould be fine names if they guard only superword optimization. You use it to skipp all loop optimizations except unrolling. It should be named differently. allow_unroll_only ?

why you need set_notpassed_slp()?:

+        // For atomic unrolled loops which are vector mapped, instigate
more unrolling.
+        cl->set_notpassed_slp();


On 6/26/15 12:43 PM, Berg, Michael C wrote:
> Hi Folks,
> I would like to contribute Vectorized loop unrolling. I need two 
> reviewers to review this patch and comment as needed:
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8129920
> webrev:
> http://cr.openjdk.java.net/~mcberg/8129920/webrev.01/
> With this change we leverage superword unrolling queries and superword 
> to stage re-entrance to ideal loop optimization. We do this when 
> superword succeeds on vectorizing a loop which was unroll query mapped.
> When we re-enter ideal loop optimization, we have already done all 
> major optimizations such as peeling, splitting, rce and superword on 
> the vector map candidate loop. Thus we only unroll the loop. We 
> utilize the standard loop unrolling environment to accomplish this 
> with default and any applicable user settings. In this way we leverage 
> unroll factors from the baseline loop which are much larger to obtain 
> optimum throughput on x86 architectures. The uplift range on 
> SpecJvm2008 is seen on scimark.lu.{small,large} with uplift noted at 3% and 8% respectively.
> We see as much as 1.5x uplift on vector centric micros like reductions 
> on default optimizations.
> Thanks,
> Michael

More information about the hotspot-compiler-dev mailing list