Vectorized Loop Unrolling on x64?
maaartinus at gmail.com
Fri Oct 27 19:46:06 UTC 2017
IIUIC the code on slide 90 is slow due to data dependencies as the only
accumulator sum is the bottleneck. Some very long time ago, I played with
unrolling it manually using multiple accumulators and gained a factor of
maybe 3. But this is well-known, so I wonder what am I missing?
IMHO there's no reason why sum += A[i] should be slower than B[i] += A[i]
assuming a sufficient iteration count.
On Tue, Oct 24, 2017 at 7:20 PM, Vladimir Sitnikov <
sitnikov.vladimir at gmail.com> wrote:
> Just in case, here's Vladimir Ivanov's vectorization talk: *http://2017.jpoint.ru/en/talks/vector-programming-in-java/
> Slide 89 describes sum misundervectorization.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the hotspot-compiler-dev