Faster Math ?
joe.darcy at oracle.com
Mon Nov 13 18:25:09 UTC 2017
A few comments on this thread:
As Paul noted, a portion of fdlibm has been ported from C to Java. I do
intend to finish the port at some point. The port gives an
implementation speedup by avoiding Java -> C -> Java transition
overheads. However, the same algorithms are being used of course.
The fdlibm code was first written several decades ago and there has been
work in the interim on developing other algorithms for math libraries.
One significant effort has focused on correctly rounded libraries, that
is, libraries that have full floating-point accuracy. In particular
Jean-Michel Muller and his students and collaborators have worked in
this area and produce the crlibm package. If a specification for a
StrictMath-style class were newly written today, I would recommend it be
specified to be correctly rounded. Correct rounding is conceptually the
"best" answer and it does not require the exact implementation
algorithms to be specified to achieve reproducibility, unlike fdlibm.
However, the extra precise answer can come at the cost of extra time or
space for the computation in some cases.
The notion of a "FastMath" library has been considered before (as well
as the faster underlying numerics ). As also discussed earlier in the
thread, specifying what degrees of inaccuracy is acceptable for what
speed is non-obvious. (And offhand I don't know the error bounds of the
other implementations being discussed.)
Working with Intel in OpenJDK, we are using optimized math library
implementations for x64 for many interesting methods. For most math
library methods, the trend has been to move to software-based
implementations rather than having specialized hardware instructions.
(Functionality like reciprocal square root is a counter-example, but we
don't have that method in the Java math library.)
Note that since 1/3 is a repeating fraction in binary and decimal,
pow(x, 1.0/3.0) is only approximately equivalent to cbrt(x).
Knowing which particular methods would be of interest for fast-but-loose
math would be helpful. The sqrt method has long been intrinsified to the
corresponding hardware instruction on many platforms so I don't think
that would be a useful candidate in most circumstances.
In short, we might get a selection of looser but faster math methods at
some point, but not immediately and not without more investigation.
 Forward looking statements during "Forward to the Past: The Case for
Uniformly Strict Floating Point Arithmetic on the JVM"
On 11/9/2017 9:19 AM, Paul Sandoz wrote:
> Hi Laurent,
> A Java method is a candidate for intrinsification if it is annotated with @HotSpotIntrinsicCandidate. When running Java code you can also use the HotSpot flags -XX:+PrintCompilarion -XX:+PrintInlining to show methods that are intrinsic (JIT watch, as mentioned, is also excellent in this regard).
> I recommend cloning OpenJDK and browsing the source.
> Some of the math functions are intrinsic in the interpreter and all the runtime compilers to ensure consistent results across interpretation and compilation.
> Work was done by Intel to improve many of the math functions. See:
> Update for x86 sin and cos in the math lib
> Update for x86 pow in the math lib
> (From these you can track related issues.)
> Other Math functions are not intrinsic like cbrt (non-native) and acos (native). There is ongoing work to turn native implementations into Java implementations (i don’t know if there would be any follow up on intrinsification).
> Joe knows more.
> As part of the Vector API effort we will likely need to investigate the support for less accurate but faster math functions. It’s too early to tell if something like a FastMath class will pop out of that, but FWIW i am sympathetic to that :-)
> I liked this tweet:
> life as a gpu compiler dev is basically just fielding repeated complaints that
> "fast math" isn't precise and "precise math" isn't fast
> as an indication of what we could be getting into :-)
>> On 9 Nov 2017, at 01:00, Laurent Bourgès <bourges.laurent at gmail.com> wrote:
>> The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos...
>> Could you check if the current JDK uses C2 intrinsics or libfdm (native /
>> JNI overhead?) and tell me if such functions are already highly optimized
>> in jdk9 or 10 ?
>> Some people have implemented their own fast Math like Apache Commons Math
>> or JaFaMa libraries that are 10x faster for acos / cbrt.
>> I wonder if I should implement my own cbrt function (cubics) in pure java
>> as I do not need the highest accuracy but SPEED.
>> Would it sound possible to have a JDK FastMath public API (lots faster but
>> less accurate?)
>> Do you know if recent CPU (intel?) have dedicated instructions for such
>> math operations ?
>> Why not use it instead?
>> Maybe that's part of the new Vectorization API (panama) ?
>> Laurent Bourges
More information about the core-libs-dev