[aarch64-port-dev ] RFR(M): 8212043: Add floating-point Math.min/max intrinsics
Pengfei Li (Arm Technology China)
Pengfei.Li at arm.com
Fri Oct 26 10:36:23 UTC 2018
> So, (probably) the commonest use of min() is faster with the intrinsic, but
> only just. Nevertheless I think this patch should go in because it can easily be
> vectorized, and that should be the next step.
I got a reason why consecutive fmins are slower. The fmin sequence generated by the nested min() calls has RaW data dependencies. One fmin writes an fp register and the next fmin reads the same one. It leads the instruction pipeline to stall frequently. Anyway, thanks for your case which helps me understand this point.
I guess we still need another reviewer to look at this patch since I modified the common part of code, although there should be no effect on other platforms. So, any other reviewers can help? http://cr.openjdk.java.net/~pli/rfr/8212043/webrev.01/
> I have a sort-of meta-suggestion. Everyone should be hostile to their own
> patches: we all need to hate our own code! This sounds crazy, but it's true.
> When you write a patch, you should deliberately write test code which shows
> your patch at its very worst. And your benchmarks should measure the worst
> case performance of your optimization.
I agree. This will be the most useful suggestion for my future work.
More information about the hotspot-compiler-dev