RFR (XS) CR 8058643: (str) Re-examine hashCode implementation
peter.levart at gmail.com
Wed Sep 24 20:25:41 UTC 2014
On 09/22/2014 04:29 PM, Aleksey Shipilev wrote:
> Since there is no performance impact for this touchup (char -> int
> promotion is happening anyway), and no functionality changes (char ->
> int promotion zero-extends), I haven't developed any new tests, or
> re-spinned any existing ones.
String.hashCode() caches the result, so repeatable call to same String
instance is faster for 2nd and further invocations. But the computation
of hash code itself could be accelerated for a factor of 2 or more on
todays CPUs. How? By parallelizing it. And I don't mean computing it in
Here is a surprising result of a simple benchmark which computes hash
code of a 128 character string in 6 different ways:
Benchmark Mode Samples Score Score error
j.t.HashBench.hashCode thrpt 8 8420103.795 162447.069 ops/s
j.t.HashBench.hashCode0 thrpt 8 8439058.660 2842.755 ops/s
j.t.HashBench.hashCode1 thrpt 8 13809510.573 337888.132 ops/s
j.t.HashBench.hashCode2 thrpt 8 15543687.568 716152.160 ops/s
j.t.HashBench.hashCode3 thrpt 8 18173224.431 49410.256 ops/s
j.t.HashBench.hashCode3x thrpt 8 8543020.232 18158.686 ops/s
The "hashCode" benchmark is invoking .hashCode() on a new String using a
'master' string as a constructor argument, so new string is constructed
with shared char array, but with no pre-computed hashCode.
Compare it to "hashCode0" benchmark which replicates String's hasCode()
implementation and has a comparable score. This is to set the baseline.
"hashCode1" benchmark is already quite faster.
"hashCode2" is faster than "hashCode1"
"hashCode3" is faster than "hashCode2"
... i haven't tried it further. It should stop somewhere, dependent on
the CPU of course.
To illustrate this has nothing to do with manual vs. automatic loop
unrolling (as it might appear by 1st glance of the code), "hashCode3x"
benchmark is equivalent to "hashCode3" regarding loop structure, but
it's result is comparable to "hashCode" and "hashCode0" benchmarks.
It appears that "hashCode1,2,3" are utilizing CPU's pipeline which can
perform multiple (2, 3, 4) multiplications in parallel, whereas
"hashCode0,3x" can't, since each iteration of loop computes the next
result from the result of previous iteration.
More information about the core-libs-dev