Faster HashMap implementation
alex14n at gmail.com
Wed Jun 3 16:21:51 PDT 2009
While playing with scala programming language I've come to a HashMap
that appears to be faster than current java.util.HashMap, also with a lower
Alex Miller advised me to post about it in this maillist.
I've put current implementation (both scala and java) and some benchmark
results at github:
Current java version implements most of the external HashMap behaviour
inluding serialization, but does not throw ConcurrentModificationException
and does not implement internal methods that can be requeired for some
Main trick is in storing all data in 2 arrays without any HashEntry objects.
One is array of Objects with keys and values, another in a complex array of
Index array has 2 parts. First part contains hashcode-to-index mappings,
second part is index-to-index mapping for elements with the same hashcode.
The rest bits are used for:
1. 'Deleted' flag to mark empty spots after key removal
(other bits part of such values is used as deleted-linked-list indices)
2. 'End-of-list' flag allowing not to access next index if this element is
3. Some bits of hashcode used to compare with hashcode of the key we are
Finding an existing value for a key has approximately the same speed as the
HashMap implementation. It is:
1. Computing element hashcode,
2. Accessing array to get index/address for this hashcode
3a. Current HashMap access HashEntry object with key/hashcode/value/next
4a. In my implementation hashcode and next are packed in index array bits,
key and value are stored in adjacent cells of the second array.
When looking for missing key my implementation can be twice faster in some
since it's only one random memory access (element index array). If it's
or its hashcode bits are not equal to hashcode bits of the key being
and it's `end-of-list` bit is set - it is enough to stop searching. Current
needs 2 random memory reads: (1) array cell with address and (2) HashEntry
at that address.
Insertion of a new key does not require any memory allocation (new HashEntry
since all arrays are pre-allocated except for resize operation. Resize
Arrays.copyOf to a twice large array and almost sequental rehashing -
reading one array
and writing it to either the same index or with `1` in highest bit.
When adding several key/values they are sequentally written to key/values
and sequental memory read is faster and more cache-friendly.
Having 2 arrays instead of a lot of HashEntry objects is more garbage
friendly and has significantly lower memory footprint.
Iteration over elements is a sequental array read which is faster.
To clone such HashMap we only need to clone 2 arrays with is also much
Elements order is preserved even after resize, only when some key is removed
it leaves an empty 'hole' that will be filled after next key insertion. Thus
order is more predictable and I think it's possible not to throw
in most situations. Iterator needs only 1 parameter - array index - and even
does not break iteration order.
But there can be problems. For example if we iterated over some key
that was deleted after that and then inserted again into higher array index,
it can eventually came up second time into iterator. But if we save highest
non-empty index that was at iteration start, and monitor insert/delete
operations (possible holes) in indices we've not iterated over yet,
we will not need to throw ConcurrentModificationException in most cases.
I've done some benchmarks that proves my assumptions:
* Reading existing key-value mapping has the same speed,
* Adding new mapping and looking for missing key is significantly faster.
But there can be some problems in my testing strategy
or benchmark can be somehow biased, so it's better to retest it
with some independent benchmarks.
Hope it will help to make java collections faster since such approach
(preallocated arrays instead of ...Entry objects) can be used to speedup
other collections. In my scala version both HashMap and HashSet
are implemented this way. It also stores primitive types like int
in primitive arrays which saves a lot of memory. But on one hand
scala has built-in boxed arrays, and on the other hand such array boxing
is slow. It also has to store keys and values in different arrays,
which gives extra random memory read and thus slower -
when keys and values are in adjacent cells of the same array
value will usually be pre-fetched when key is read.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the core-libs-dev