An Alternative Hashing Alternative
dl at cs.oswego.edu
Wed Jun 6 16:22:19 UTC 2012
I just posted the following to the concurrency-interest list.
I'll send a follow-up on tie-ins to core-lib issues next.
Finally acting on an old idea, I committed an update to
ConcurrentHashMap (currently only the one in our jdk8 preview
package, as jsr166e.ConcurrentHashMap8) that much more gracefully
handles maps that are huge or have many keys with colliding
hash codes. Internally, it uses tree-map-like structures to
maintain bins containing more nodes than would be expected
under ideal random key distributions over ideal numbers of bins.
Among other things, this provides graceful (O log(N)) degradation when
there are more than about a billion elements (which run out of 32bit
hash resolution and array bound limits). However, the map can only
do so if keys are Comparable, which is true of most keys in
practice -- Strings, Longs, etc. Without relying on Comparable,
we can't otherwise magically find any other means to further
resolve and organize keys after running out of bits, so performance
for non-Comparable keys is unaffected/unimproved.
This also provides a mechanism for coping with hostile usages in
which many keys (Strings in particular) are somehow constructed to
have the same hashCode, which can lead to algorithmic denial
of service attacks (because of linear-time bin searches) if
code using the map don't screen external inputs. The overflow-tree-based
strategy here is an alternative approach to adding secondary
randomly-seeded hash code methods ("hash32") to class String,
as has been committed recently to OpenJDK. But ConcurrentHashMap8
doesn't doesn't rely on this.
The use of overflow-bins also allowed a few other minor speedups
in more typical usages. A few more small improvements are still
left unfinished for now. (I'll be traveling and/or otherwise
committed for most of the next few weeks.)
Please give this a try and let me know about experiences.
More information about the core-libs-dev