RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling
Deshpande, Vivek R
vivek.r.deshpande at intel.com
Thu Aug 22 19:26:11 UTC 2019
On AVX2 (256 bit vector) machine I did not observe the difference in the generated code, same as your observation.
But on AVX3(512 bit/ 64 byte vector) machine the generated code with the patch was generating the AVX2 (256 bit) instructions instead of AVX3 (512 bit) instructions.
So it is not able to use the complete vector width with the patch.
As far as performance is concerned with this particular benchmark, that I have shared, and with given number of iterations in the benchmark, I did not observe any difference with the patch from original.
So it's the difference in the generated code which is not using full vector width.
From: Jie Fu [mailto:fujie at loongson.cn]
Sent: Wednesday, August 21, 2019 8:53 PM
To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net; Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
Subject: Re: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling
Thanks for your review and comments.
On 2019/8/22 上午3:37, Deshpande, Vivek R wrote:
> The 2nd compilation(recompilation) of doit2 generates the code which
> does not use full vector width
I don't understand why we must compile with vector-512 since you didn't share the performance of vector-256 and vector-512 respectively.
As for my reproducer, when vector-128 is used, performance becomes much slower than vector-64.
Does vector-512 runs much faster than vector-256 on your manchine?
Could you please share the detailed performance data?
> and also does not unroll after vectorization.
Sorry. I can't reproduce this issue on my computer.
Here is the compilation log of my testing.
Probably this issue can be only triggered on an AVX-512 machine.
I will investigate it once my AVX-512 machine is ready.
If I missed anything please let me know.
Thanks a lot.
More information about the hotspot-compiler-dev