[intrinsics] performance improvements for the intrinsified version of Objects::hash

Vicente Romero vicente.romero at oracle.com
Thu Feb 28 13:12:11 UTC 2019

On 2/27/19 8:18 PM, Alex Buckley wrote:
> Believing that the second column is intended to be "Intrinsics_02_26", 
> not "Intrinsics_02_22":

that's correct sorry for the mistake on the column naming

> The speedups for reference variables get worse with more arguments 
> (though they may still be faster than vanilla invocation for a good 
> while), and the speedups for primitive variables get better with more 
> arguments.
> One metric is how many variables can be passed and still have 
> intrinsification offer a speedup relative to vanilla invocation. (The 
> cliff between 60 and 70.) Another metric is how many variables can be 
> passed before the speedup stops growing, even if intrinsification is 
> always faster than vanilla invocation. (The global maximum of 
> performance, between 10 and 40.) Presumably, each metric is governed 
> by a different factor.

right good analysis, I will do some more research to try to see where 
the execution time is going to

> Alex


> On 2/26/2019 8:28 PM, Vicente Romero wrote:
>> Hi all,
>> I have investigated further about the degradation of the intrinsified
>> version Objects::hash for reference types. I have made performance
>> measures for different number of arguments. Please see the results
>> attached. At least on my PC it seems like there is a cliff from 60 to 70
>> arguments. Up to 60 the intrinsified version is faster than vanilla
>> JDK13 but at 70 and on the intrinsified version start being slower.
>> Interesting, also if the current implementation starts being worst
>> starting at 70 non-primitive arguments, that seems like a very good
>> compromise.
>> Thanks,
>> Vicente
>> On 2/26/19 8:49 PM, Vicente Romero wrote:
>>> Hi all,
>>> I have just pushed [1] which improves the performance of the
>>> intrinsified version of Objects::hash in almost all of our performance
>>> test cases. This is a big improvement compared to the previous state
>>> but there is still work to be done. Please find attached a file with
>>> the benchmark results. It includes the performance numbers obtained
>>> with the intrinsics repo as of 02/22 plus the ones obtained, almost
>>> now :), after pushing [1]. As it can be seen there is a noticeable
>>> improvement in the performance. In the last performance measurement we
>>> found a noticeable degradation in performance for large number of
>>> arguments (~100), even for primitive types. Patch [1] improves the
>>> performance for both primitive and reference types with the difference
>>> that now the performance is much better than vanilla JDK13 for
>>> primitive types but it is still worst than vanilla for reference
>>> types. Although we are in better shape now compared to the state as of
>>> 02/22. Keep tuned :)
>>> Thanks,
>>> Vicente
>>> [1] http://hg.openjdk.java.net/amber/amber/rev/0f40d5752eb9
>>> On 2/22/19 4:46 PM, Vicente Romero wrote:
>>>> Hi,
>>>> To complete the picture please find attached the performance results
>>>> for Objects.hash for a number of experiments. In general they don't
>>>> look as good as the ones for String::format. In general it seems like
>>>> there is no much gain unless the number of parameters is large and
>>>> all the parameters are constants. This is understandable because the
>>>> compiler generates an LDC of the result. In all other cases the
>>>> performance is just a bit better or a lot worst.
>>>> Thanks,
>>>> Vicente
>>>> On 2/22/19 12:33 PM, Vicente Romero wrote:
>>>>> Hi,
>>>>> I have executed some performance tests on the intrinsics code to
>>>>> compare the before and after. Please find the benchmark results and
>>>>> the JMH based benchmark attached. This benchmark is based on a
>>>>> previous one written by Hannes. The benchmark compares the execution
>>>>> between the JDK built from [1], referred here as JDK13, and [2]
>>>>> which is the amber repo, branch `intrinsics-project`.
>>>>> Some conclusions from the benchmark results:
>>>>>   * the intrinsified code is faster in all cases, for which
>>>>>     intrinsified code is produced, compared to the legit (JDK13
>>>>>     vanilla) code
>>>>>   * there are wide variations though
>>>>> For example for the test: `testStringFormatBoxedArray` which is
>>>>> basically benchmarking the performance of: `String.format("%s: %d ",
>>>>> args);` where args is: `static final Object[] args = { "Bob", i23
>>>>> };`, there is basically no visible gain as in this case the
>>>>> intrinsification is bailing out and producing same code as vanilla
>>>>> JDK13. This result is expected. The next test with not so much gain
>>>>> is: `testStringFormat1ConstantFloat` which is testing:
>>>>>     `String.format("%g", 1.0)`
>>>>> the execution is ~2.5 times faster in the intrinsified version but
>>>>> nothing compared to: `testStringFormat1ConstantStr` which is ~40
>>>>> times faster. Another interesting conclusion is that the improvement
>>>>> fades out with the number of parameters for some cases but keeps
>>>>> constant for others. For example it is as fast to concatenate 1 or
>>>>> 100 strings but formating one primitive int is ~45 times faster vs a
>>>>> 3.5 improvement when formating a hundred.
>>>>> I have also attached the table I used to play with the numbers.
>>>>> Thanks,
>>>>> Vicente
>>>>> [1] http://hg.openjdk.java.net/jdk/jdk
>>>>> [2] http://hg.openjdk.java.net/amber/amber

More information about the amber-dev mailing list