[intrinsics] performance improvements for the intrinsified version of Objects::hash

Vicente Romero vicente.romero at oracle.com
Wed Feb 27 01:49:49 UTC 2019

Hi all,

I have just pushed [1] which improves the performance of the 
intrinsified version of Objects::hash in almost all of our performance 
test cases. This is a big improvement compared to the previous state but 
there is still work to be done. Please find attached a file with the 
benchmark results. It includes the performance numbers obtained with the 
intrinsics repo as of 02/22 plus the ones obtained, almost now :), after 
pushing [1]. As it can be seen there is a noticeable improvement in the 
performance. In the last performance measurement we found a noticeable 
degradation in performance for large number of arguments (~100), even 
for primitive types. Patch [1] improves the performance for both 
primitive and reference types with the difference that now the 
performance is much better than vanilla JDK13 for primitive types but it 
is still worst than vanilla for reference types. Although we are in 
better shape now compared to the state as of 02/22. Keep tuned :)


[1] http://hg.openjdk.java.net/amber/amber/rev/0f40d5752eb9

On 2/22/19 4:46 PM, Vicente Romero wrote:
> Hi,
> To complete the picture please find attached the performance results 
> for Objects.hash for a number of experiments. In general they don't 
> look as good as the ones for String::format. In general it seems like 
> there is no much gain unless the number of parameters is large and all 
> the parameters are constants. This is understandable because the 
> compiler generates an LDC of the result. In all other cases the 
> performance is just a bit better or a lot worst.
> Thanks,
> Vicente
On 2/22/19 12:33 PM, Vicente Romero wrote:
>> Hi,
>> I have executed some performance tests on the intrinsics code to 
>> compare the before and after. Please find the benchmark results and 
>> the JMH based benchmark attached. This benchmark is based on a 
>> previous one written by Hannes. The benchmark compares the execution 
>> between the JDK built from [1], referred here as JDK13, and [2] which 
>> is the amber repo, branch `intrinsics-project`.
>> Some conclusions from the benchmark results:
>>   * the intrinsified code is faster in all cases, for which
>>     intrinsified code is produced, compared to the legit (JDK13
>>     vanilla) code
>>   * there are wide variations though
>> For example for the test: `testStringFormatBoxedArray` which is 
>> basically benchmarking the performance of: `String.format("%s: %d ", 
>> args);` where args is: `static final Object[] args = { "Bob", i23 
>> };`, there is basically no visible gain as in this case the 
>> intrinsification is bailing out and producing same code as vanilla 
>> JDK13. This result is expected. The next test with not so much gain 
>> is: `testStringFormat1ConstantFloat` which is testing:
>>     `String.format("%g", 1.0)`
>> the execution is ~2.5 times faster in the intrinsified version but 
>> nothing compared to: `testStringFormat1ConstantStr` which is ~40 
>> times faster. Another interesting conclusion is that the improvement 
>> fades out with the number of parameters for some cases but keeps 
>> constant for others. For example it is as fast to concatenate 1 or 
>> 100 strings but formating one primitive int is ~45 times faster vs a 
>> 3.5 improvement when formating a hundred.
>> I have also attached the table I used to play with the numbers.
>> Thanks,
>> Vicente
>> [1] http://hg.openjdk.java.net/jdk/jdk
>> [2] http://hg.openjdk.java.net/amber/amber

Test	                                            JDK13 INTRINSICS 02/22	JDK13 INTRINSICS 02/26	JDK13 Vanilla	Intrinsics_02_22 / Vanilla	Intrinsics_02_26 / Vanilla
FormatterBenchmark.testHash100DoubleConstants	    42191.208	            39412.877	            23.385	        1804X	                    1685X
FormatterBenchmark.testHash100IntConstants	        42120.514	            41763.538	            21.588	        1951X	                    1934X
FormatterBenchmark.testHash100IntVariables	        3.993	                697.194	                26.592	        0.15X	                    26X
FormatterBenchmark.testHash100References	        3.425	                31.835	                430.67	        0.007X	                    0.07X
FormatterBenchmark.testHash1DoubleConstant	        41651.197	            42981.061	            21187.706	    2X	                        2X
FormatterBenchmark.testHash1IntConstant	            38128.38	            42325.322	            41139.332	    1X	                        1X
FormatterBenchmark.testHash1IntVariable	            42564.255	            41148.668	            42798.895	    1X	                        1X
FormatterBenchmark.testHash1Reference	            30692.986	            37773.7	                36007.401	    0.8X	                    1X
FormatterBenchmark.testHash2DoubleConstants	        36643.845	            42804.629	            4502.75	        8X	                        9X
FormatterBenchmark.testHash2IntConstants	        42661.835	            43037.199	            9027.443	    4.7X	                    4.7X
FormatterBenchmark.testHash2IntVariables	        41573.286	            19075.237	            9018.512	    4.6X	                    2X
FormatterBenchmark.testHash2References	            35126.523	            18365.732	            12513.845	    2.8X	                    1.4X

