Vectorized Loop Unrolling on x64?

Martin Grajcar maaartinus at
Fri Oct 27 19:46:06 UTC 2017

IIUIC the code on slide 90 is slow due to data dependencies as the only
accumulator sum is the bottleneck. Some very long time ago, I played with
unrolling it manually using multiple accumulators and gained a factor of
maybe 3. But this is well-known, so I wonder what am I missing?

IMHO there's no reason why sum += A[i] should be slower than B[i] += A[i]
assuming a sufficient iteration count.

On Tue, Oct 24, 2017 at 7:20 PM, Vladimir Sitnikov <
sitnikov.vladimir at> wrote:

> Just in case, here's Vladimir Ivanov's vectorization talk: *
> <>*
> Slide 89 describes sum misundervectorization.
> Vladimir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the hotspot-compiler-dev mailing list