RFR 8076276 support for AVX512

Berg, Michael C michael.c.berg at intel.com
Wed Apr 29 21:39:58 UTC 2015

Thanks Vladimir for the review and for sponsoring this set of changes.  
Can a second person please take a look at this patch and comment as needed.

Thanks in advance,

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Wednesday, April 29, 2015 2:11 PM
To: hotspot-compiler-dev at openjdk.java.net
Cc: Berg, Michael C
Subject: Re: RFR 8076276 support for AVX512

For the records, I reviewed it and I think it is good.


On 4/23/15 12:24 PM, Vladimir Kozlov wrote:
> Updated webrev:
> http://cr.openjdk.java.net/~kvn/8076276/webrev.02
> Passed JPRT testing.
> Changes:
> The assembler layer now handles KNL as well for EVEX, it's a target that
> will be available earlier than Skylake server.   This is done by
> carefully managing cpuid information and applying each machines 
> characteristics to their code generation model.  I also added support 
> for 32-bit compilation via the machine description which manage many 
> of the same things in 64-bit with some additions for instruction size 
> calculations, such as a static function which answers the question of 
> displacement size for memory offsets.  You will see two versions, one 
> which modifies the offset and answer the question of size range, 
> another which statically takes all the equivalent object data as its 
> dynamic counterpart as input to interpret if the displacement fits the motif.
> One is made to be run statically and one as part of assembler 
> processing in its allocated object dynamically.  There is also a dummy 
> region in 32-bit register description of floating point registers 
> which are used to stage regmask alignment for the xmm register bank on 
> that target.  I do this so that I can use the same code for both 
> compiler models wrt register mask handling of vector components.  
> Please also note the new long java tests in superword.  The afore 
> mentioned zmm save region for OS vector testing was ported to run in 
> KNL mode.  The call save regions have been extended for both 
> compilation models to handle their respective register banks and are working correctly.
> Thanks,
> Michael
> On 4/9/15 4:53 PM, Vladimir Kozlov wrote:
>> Michael,
>> Thank you for detail explanation. I need to clarify by request:
>> 1. I am fine with kmov amd Kregister definitions and usage in 
>> assembler, macroassembler and stubs.
>> 2. I don't want KRegister and Kmove in C2 code (opto/ and .ad files) 
>> until we have full support for them in RA and signal processing.
>> Thanks,
>> Vladimir
>> On 4/9/15 4:02 PM, Berg, Michael C wrote:
>>> Vladimir, some explanation of the EVEX encoding model is needed:
>>> Some instructions are agnostic to vector length and can take the 
>>> implicit k0 definition in encoding.  Some instructions must have 
>>> predication definitions for their mask application to SIMD, which 
>>> explicitly exclude k0. The range usage of predication mask registers 
>>> must be k1..k7 as a real definition which code must provide with a 
>>> mask value.  The EVEX enabled machine environment does not 
>>> automatically initialize any of the mask assignable registers 
>>> (k1..k7), so we must emit kmov instructions which gather an 
>>> immediate value from a gpr register.  You will see code such as this 
>>> in the review.  This effectively means KRegister must stay in the 
>>> implementation, but I can accommodate the lion share of what you 
>>> have indicated.  The places where KRegister is used via the 
>>> assembler layer
>>> are:
>>> src/cpu/x86/vm/stubGenerator_x86_64.cpp: 265,
>>> src/cpu/x86/vm/stubGenerator_x86_32.cpp: 169 "not there yet, but it 
>>> needs one too"
>>> src/cpu/x86/vm/macroAssembler_x86.cpp: 4550, 7046
>>> This is in place of formal register allocation for now as well as 
>>> when we do more extravagant things with SIMD masks.  I will keep the 
>>> webrev around so I can easily add these pieces back in as we are 
>>> going to need them.
>>> Also there are many other mask register instructions in the ISA 
>>> which we will need to make use of in the future.  If this is 
>>> amenable I will look into the other changes and resend the webrev accordingly modified.
>>> Thanks,
>>> Michael
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Wednesday, April 08, 2015 1:33 PM
>>> To: Berg, Michael C
>>> Cc: hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR 8076276 support for AVX512
>>> Michael, please, make sure to include mailing lists in replies - it 
>>> is review process.
>>> I understand that K register may be important but I don't see the 
>>> need to include it in these changes which are huge already. We can 
>>> do it as separate changes unless you point me where they are 
>>> critical needed for avx512 instructions.
>>> I don't see the use of it in current changes which simple widen 
>>> vectors to 512 bits.
>>> I am concern that K reg implementation is incomplete but it is hard 
>>> to see and review it in current changes.
>>> Regards,
>>> Vladimir
>>> On 4/8/15 1:09 PM, Berg, Michael C wrote:
>>>> Vladimir, RegK is needed as it frames the kmov instructions which 
>>>> utilize KRegister and the enumerated k registers, which are 
>>>> critically needed and used, although not yet matched (we use k1 and 
>>>> k0 now).  I will look into to the rest of the comments.  The plan 
>>>> is to register allocate the k registers at some point though.
>>>> Thanks,
>>>> Michael
>>>> -----Original Message-----
>>>> From: hotspot-compiler-dev
>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of 
>>>> Vladimir Kozlov
>>>> Sent: Wednesday, April 08, 2015 12:36 PM
>>>> To: hotspot-compiler-dev at openjdk.java.net
>>>> Subject: Re: RFR 8076276 support for AVX512
>>>> I would suggest to remove MoveK and RegK from these changes since 
>>>> they are not used.
>>>> We can add them later when you have the use case.
>>>> sharedRuntime_x86_64.* You should have code and not comment:
>>>> // TODO: add ZMM save code
>>>> vm_version_x86.cpp Add code to verify that system preserve Z 
>>>> registers during interrupt. See code after comment :
>>>> // Some OSs have a bug when upper 128bits of YMM
>>>> I see repeated next pattern in C1 code. It should be moved to a 
>>>> function in FrameMap:
>>>> +        int num_caller_save_xmm_regs = 
>>>> +FrameMap::nof_caller_save_xmm_regs;
>>>> +#if _LP64
>>>> +        if (UseAVX < 3) {
>>>> +          num_caller_save_xmm_regs = num_caller_save_xmm_regs / 2;
>>>> +        }
>>>> +#endif
>>>> In general we should avoid using #ifdef X86 in shared code:
>>>> matcher.cpp. This file will not be issue if you remove RegK from 
>>>> changes.
>>>> c2compiler.cpp - can you move that code to
>>>> Compile::pd_compiler2_init() which is platform specific?
>>>> matcher.cpp - typo 'eno':
>>>> +    // For VecZ we need eno alignment and 64 bytes (16 slots) for
>>>> spills.
>>>> Thanks,
>>>> Vladimir
>>>> On 4/6/15 6:35 PM, Berg, Michael C wrote:
>>>>> Hi Folks,
>>>>> We (Intel) would like to contribute initial support for AVX512 
>>>>> (EVEX encoding, new register support, new ISA support,
>>>>> etc) for EVEX enabled microarchitectures.
>>>>> The contribution is referenced as Bug ID 8076276 as a performance 
>>>>> enhancement.
>>>>> Please review this patch and comment as needed:
>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076276
>>>>> webrev:
>>>>> http://cr.openjdk.java.net/~kvn/8076276/webrev
>>>>> Superword optimizations covered on the vectorization path 
>>>>> experience as much as 50% reduction in loop trace instruction 
>>>>> count which make up the path length of EVEX encoded SIMD optimized loops.
>>>>> Vladimir Koslov has offered to sponsor this patch.

More information about the hotspot-compiler-dev mailing list