RFR(M):8214751: X86: Support for VNNI instruction

Deshpande, Vivek R vivek.r.deshpande at intel.com
Fri Dec 7 18:33:15 UTC 2018

Hi Vladimir

This patch is useful for AI ML/DL applications such convolution based Neural Nets. 
I have updated the patch with your suggestion.
I am creating the MulAddS2I patch late and before vectoriztion.
The updated webrev is here:

I am also working on the test.


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Thursday, December 6, 2018 11:59 AM
To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>; hotspot-compiler-dev at openjdk.java.net compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(M):8214751: X86: Support for VNNI instruction

Hi Vivek,

What applications benefit this optimizations?

This optimization may prevent some constant folding and others IGVN optimizations and RA since MulAddS2INode is generated too early I think. The only benefit we will have only if vectors are generated. Can you generate vectors without MulAddS2INode? Or create MulAddS2INode just before vectorization and expand it if vectorization failed? I would prefer first solution to have a struct in SuperWord code which find such pattern and try to vectorize it.

You need to add test to verify correctness of results.
Add UseAVX == 0 check to predicates which use SSE2 code. Otherwise they may be selected even if UseAVX > 0.


On 12/3/18 8:58 PM, Deshpande, Vivek R wrote:
> Hi All
> Could you please review the VNNI VPDPWSSD instruction support with autovectorization.
> It can vectorize this operation in the loop:
> out[i] += ((in1[2*i] * in2[2*i]) + (in1[2*i+1] * in2[2*i+1])); More 
> information on VNNI can be found here:
> https://software.intel.com/sites/default/files/managed/c5/15/architect
> ure-instruction-set-extensions-programming-reference.pdf
> The initial performance gains with micro on skylake with AVX3 is 10.8x.
>   and it generates
> vmovdqu xmm3, xmmword ptr [rbp+r8*2+0x10] vmovdqu xmm6, xmmword ptr 
> [rdx+r8*2+0x10] vpmaddwd xmm3, xmm6, xmm3 vpaddd xmm3, xmm3, xmmword 
> ptr [r9+rdi*4+0x10] vmovdqu xmmword ptr [r9+rdi*4+0x10], xmm3
> It can generate vpdpwssd instruction on cascadelake.
> The webrev is here:
> http://cr.openjdk.java.net/~vdeshpande/8214751/VNNI/webrev.00/
> <http://cr.openjdk.java.net/%7Evdeshpande/8214751/VNNI/webrev.00/>
> The jbs entry for the same is here:
> https://bugs.openjdk.java.net/browse/JDK-8214751
> Regards,
> Vivek

More information about the hotspot-compiler-dev mailing list