performance surprise with Object.hashCode()
john.r.rose at oracle.com
Mon May 13 14:31:33 PDT 2013
On May 13, 2013, at 11:46 AM, Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:
> On 05/13/2013 10:35 PM, Aleksey Shipilev wrote:
>> Note that the code generated in o_o and o_i cases are structurally
>> indistinguishable, but o_i naturally goes through the slow path. I
>> wonder why we are losing the information about receiver type being the
>> integer in o_i case, and skip the proper devirtualization...
> In fact, I *do* think the Object.hashCode intrinsic plays a trick on us
> Linux x86_64, JDK 8b88:
> o.s.g.a.AndyBench.eee: 0.997 +- 0.020 nsec/op
> o.s.g.a.AndyBench.i_i: 1.288 +- 0.035 nsec/op
> o.s.g.a.AndyBench.o_o: 2.709 +- 0.129 nsec/op
> o.s.g.a.AndyBench.o_i: 4.925 +- 0.098 nsec/op
> -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_hashCode:
> o.s.g.a.AndyBench.eee: 0.992 +- 0.012 nsec/op
> o.s.g.a.AndyBench.i_i: 1.288 +- 0.024 nsec/op
> o.s.g.a.AndyBench.o_o: 27.888 +- 1.326 nsec/op
> o.s.g.a.AndyBench.o_i: 1.623 +- 0.033 nsec/op
> See how o_i case gets to perform much better. o_o naturally takes the
> hit with the intrinsic disabled.
Nice use of DisableIntrinsic!
The intrinsic for the "I must dispatch" case of o_i is slower than a monomorphic inline cache to Integer.hashCode.
However, this should not matter. If a MIC would save the day, then pre-compilation profiling should do even better (allowing inlining).
Something about this code is overcoming the intended effect of UseTypeProfile. Often it is a lack of warmup. In this case it may be a bug in C2, if find_intrinsic etc. is used to generate graph before the type profile is used (TypeProfileMajorReceiverPercent etc.).
More information about the hotspot-compiler-dev