[aarch64-port-dev ] [PATCH] 8217561 : X86: Add floating-point Math.min/max intrinsics, approval request
jatin.bhateja at intel.com
Fri Mar 1 10:20:56 UTC 2019
Please see my response embedded in following mail.
> -----Original Message-----
> From: Andrew Dinn [mailto:adinn at redhat.com]
> Sent: Friday, March 1, 2019 3:15 PM
> To: Bhateja, Jatin <jatin.bhateja at intel.com>; Vladimir Kozlov
> <vladimir.kozlov at oracle.com>; Pengfei Li (Arm Technology China)
> <Pengfei.Li at arm.com>; B. Blaser <bsrbnd at gmail.com>; aarch64-port-
> dev at openjdk.java.net
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: [aarch64-port-dev ] [PATCH] 8217561 : X86: Add floating-point
> Math.min/max intrinsics, approval request
> On 01/03/2019 02:35, Bhateja, Jatin wrote:
> >>> I didn't quite understand Jatin's additional code below.
> >>> --
> >>> +#ifdef X86
> >>> + // Being conservative since all the phi edges may not be set
> >>> + // by now. This is done to skip over reduction scenarios.
> >>> + if (a->is_Phi() || b->is_Phi())
> >>> + return false;
> >>> +#endif
> >>> --
> >>> Is it going to black out *all* reduction scenarios? I see the
> >>> intrinsics benefit
> >> the reduction in some cases. And in my opinion, adding this kind of
> >> platform- dependent macros in hotspot shared code is not so good.
> > Proposed check was added based on the common reduction scenario cases
> > which showed performance degradation with new intrinsic sequence for
> That doesn't actually clarify things very well. Are you saying:
> 1a) your patch disables FPMinMax reduction for all architectures?
> 1b) your patch disables FPMinMax reduction for x86?
> 2a) it does so because when reduction is enabled x86 fails to show
> performance improvement for applications of reduction?
> 2b) it does so because when reduction is enabled x86 fails to show
> performance improvement for selection of the FPMin/Max intrinsic?
Current patch which is under review does not contain above code change to bypass intrinsic creation for reduction patterns.
For X86 performance degrades with intrinsic w.r.t to non-intrinsic implementation in reduction
scenarios with and without data variance (i.e. with and without branch predication effects).
I could not find right hooks which can be called from common code for adding any such target specific checks during ideal(DAG) construction.
Please share if you know any.
> I think you are saying 1a and 2b but I'd prefer to be sure. I would like a clear
> answer because Pengfei has a pending patch which shows significant benefit
> on AArch64 using first the FPMin/Max intrinsic and then, for extra gain,
> FPMin/Max reduction. My own investigations have not show any detrimental
> effect to using the intrinsic or reduction and Andrew Haley seems to have
> withdrawn the claim that the intrinsic can worsen performance. So, it is quite
> important to understand what your patch does and why.
> If there is some other way to avoid the slowdown on x86 (whether that
> comes with use of the intrinsic or with use of reduction) without clobbering
> the gains to be had on AArch64 then that would be preferable.
> Andrew Dinn
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander
More information about the hotspot-compiler-dev