Optimizing saturated casts
linuxhippy at gmail.com
Mon Dec 22 14:06:35 PST 2008
I currently use an utility-class heavily for the XRender Java2D
backend, which performs saturated casts:
1.) return (short) (x > Short.MAX_VALUE ? Short.MAX_VALUE : (x <
Short.MIN_VALUE ? Short.MIN_VALUE : x));
2.) return (short) (x > 65535 ? 65535 : (x < 0) ? 0 : x);
I spent quite some time benchmarking/tuning the
protocol-generation-methods, and a lot of cycles are spent in those
saturated casts, even if the utility methods are static.
E.g. XRenderFillRectangle takes 40 cycles without clamping, but
already 70 cycles with on my core2duo with hotspot-server/jdk 14.0.
Hotspot seems to solve the problem always with conditional jumps,
although well predictable ones.
Modern processors seem to have support for this kind of operation, in
x86 there's packssdw in MMX/SSE2.
I think something like a saturated cast could be quite useful, there
are already cast-methods in Long/Integer/Short - what do you think
about adding saturated casts to that API?
Those could be instrified to use MMX/SSE2 if available.
If that would be too specific how hard would it be to add this kind of
optimization to hotspot?
How far does SIMD support in hotspot go (I read some time ago there've
been some optimizations), if SIMD would be supported 4 casts could be
done in a single cycle :)
More information about the hotspot-compiler-dev