Faster Math ?
bourges.laurent at gmail.com
Fri Nov 17 10:08:36 UTC 2017
Some context first:
- Marlin renderer is now the default JDK & JFX renderer. Please consider
improving the performance of following 2 Math functions: cbrt, acos.
- I work for the public research in astrophysics by making software for
astronomy as java desktop apps (javaws + scientific computations) see
http://jmmc.fr . It is hard to promote Java in science as both Python &
Julia languages are wide spread.
Please consider any change making Java more competitive for Science:
- faster math, more math functions (matrix & vector API), FFT, GPU
computing... the Panama project is very important to me
- struct (value type) & Friend interface (fast native lib reuse) are
promising features ... Vahalla ?
Now I answer below:
Thanks for your feedback
As Paul noted, a portion of fdlibm has been ported from C to Java. I do
intend to finish the port at some point. The port gives an implementation
speedup by avoiding Java -> C -> Java transition overheads. However, the
same algorithms are being used of course.
The fdlibm code was first written several decades ago and there has been
work in the interim on developing other algorithms for math libraries. One
significant effort has focused on correctly rounded libraries, that is,
libraries that have full floating-point accuracy. In particular Jean-Michel
Muller and his students and collaborators have worked in this area and
produce the crlibm package. If a specification for a StrictMath-style class
were newly written today, I would recommend it be specified to be correctly
rounded. Correct rounding is conceptually the "best" answer and it does not
require the exact implementation algorithms to be specified to achieve
reproducibility, unlike fdlibm.
Accuracy is important but what is the cost ? Java has 2 Math
implementations: Math & StrictMath... but also the strictfp keyword.
So the Math class is the JDK fast math... compared to StrictMath.
Maybe it could give results less accurate: 1 or 2 last digits ... maybe 10
or 100 ulps ?
However, the extra precise answer can come at the cost of extra time or
space for the computation in some cases.
The notion of a "FastMath" library has been considered before (as well as
the faster underlying numerics ). As also discussed earlier in the
thread, specifying what degrees of inaccuracy is acceptable for what speed
is non-obvious. (And offhand I don't know the error bounds of the other
implementations being discussed.)
Please look JaFaMa @ github whose FastMath gives correct results at 1e-15
precision and is very fast.
I will give you my benchmark results on jdk9...
Working with Intel in OpenJDK, we are using optimized math library
implementations for x64 for many interesting methods. For most math library
methods, the trend has been to move to software-based implementations
rather than having specialized hardware instructions. (Functionality like
reciprocal square root is a counter-example, but we don't have that method
in the Java math library.)
Please port all maths in java first, delete fdlibm native code and later
make intrinsics for most used methods (any math used within jdk or jfx...)
Who could help ?
Note that since 1/3 is a repeating fraction in binary and decimal, pow(x,
1.0/3.0) is only approximately equivalent to cbrt(x).
Knowing which particular methods would be of interest for fast-but-loose
math would be helpful. The sqrt method has long been intrinsified to the
corresponding hardware instruction on many platforms so I don't think that
would be a useful candidate in most circumstances.
Yes but no CBRT intrinsics ! It is important for our cubics curve solver.
ACOS / ASIN are still slow.
I could make the port... in java.
In short, we might get a selection of looser but faster math methods at
some point, but not immediately and not without more investigation.
 Forward looking statements during "Forward to the Past: The Case for
Uniformly Strict Floating Point Arithmetic on the JVM"
On 11/9/2017 9:19 AM, Paul Sandoz wrote:
> Hi Laurent,
> A Java method is a candidate for intrinsification if it is annotated with
> @HotSpotIntrinsicCandidate. When running Java code you can also use the
> HotSpot flags -XX:+PrintCompilarion -XX:+PrintInlining to show methods that
> are intrinsic (JIT watch, as mentioned, is also excellent in this regard).
> I recommend cloning OpenJDK and browsing the source.
> Some of the math functions are intrinsic in the interpreter and all the
> runtime compilers to ensure consistent results across interpretation and
> Work was done by Intel to improve many of the math functions. See:
> Update for x86 sin and cos in the math lib
> Update for x86 pow in the math lib
> (From these you can track related issues.)
> Other Math functions are not intrinsic like cbrt (non-native) and acos
> (native). There is ongoing work to turn native implementations into Java
> implementations (i don’t know if there would be any follow up on
> Joe knows more.
> As part of the Vector API effort we will likely need to investigate the
> support for less accurate but faster math functions. It’s too early to tell
> if something like a FastMath class will pop out of that, but FWIW i am
> sympathetic to that :-)
> I liked this tweet:
> life as a gpu compiler dev is basically just fielding repeated
> complaints that
> "fast math" isn't precise and "precise math" isn't fast
> as an indication of what we could be getting into :-)
> On 9 Nov 2017, at 01:00, Laurent Bourgès <bourges.laurent at gmail.com>
>> The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos...
>> Could you check if the current JDK uses C2 intrinsics or libfdm (native /
>> JNI overhead?) and tell me if such functions are already highly optimized
>> in jdk9 or 10 ?
>> Some people have implemented their own fast Math like Apache Commons Math
>> or JaFaMa libraries that are 10x faster for acos / cbrt.
>> I wonder if I should implement my own cbrt function (cubics) in pure java
>> as I do not need the highest accuracy but SPEED.
>> Would it sound possible to have a JDK FastMath public API (lots faster but
>> less accurate?)
>> Do you know if recent CPU (intel?) have dedicated instructions for such
>> math operations ?
>> Why not use it instead?
>> Maybe that's part of the new Vectorization API (panama) ?
>> Laurent Bourges
More information about the core-libs-dev