Value type hash code

David Simms david.simms at oracle.com
Tue Apr 10 14:49:06 UTC 2018


After prototyping "hashCode()" for value types, here's a few 
observations and thoughts...

  * "The general contract of hashCode" [1] is unchanged.
  * The default implementation, if no user implementation is provided,
    is assumed to be completely based upon the entire contents of the
    value (there's nothing else to go on).
      o The current "Object" implementation, both its generation and
        object header hash caching are completely inappropriate for
        value types.
      o The VM cannot pretend to know one field is more significant than
        another.
      o Large values will benefit from user implementation to provide
        efficiency.
      o Whilst the VM may provide a default implementation for safety, a
        "javac" generated method would be optimal (can be optimized by
        the JIT, includes inlining).
  * Values containing references whose contents are mutable pose a
    problem, their hash code is only as stable as the contents of the
    whole object graph.
      o Objects may suffer similar problems, difficult to say this is
        any more important for values. Except to say values are supposed
        to be "immutable" but references may break this quality, perhaps
        "javac" could warn when value fields are mutable objects (not
        always possible, e.g. field reference to an interface).

I assume a the default implementation should look something like this 
(only with concrete fields, not reflection):

         int hc = 0;
         for (Field field : val.getClass().getDeclaredFields()) {
             if (Modifier.isStatic(field.getModifiers())) continue;

             // Using the generic JDK hash for all types
             hc = (31 * hc) + Objects.hashCode(field.get(val));
         }
         return hc;

This algorithm assumes the VM implements calls to reference field's 
hashCode(), and encodes primitives the same as their boxed JDK 
counter-parts (e.g. "Long.hashCode(long l)" does not generically hash 
two int size chunks, rather it xors hi and lo, Boolean is another 
interesting example 1231 || 1237). Unclear if this is actually 
important...however, this example:

     final __ByValue class MyInt implements Comparable<MyInt> {
         final int value;
         //....
     }

The user is free to explicitly delegate to "Integer.hashCode(int val)", 
but is it just more natural that the default works this way ? 
Alternative VM implementations might hash over value data payload 
including field padding. With h/w support (suitable checksum 
instruction) there might be some performance gain for large values, but 
then if you introduce object references, said h/w support would get 
broken. Said implementation would be dependent on field layout, and not 
give the same result on different platforms. Whilst the Javadoc states 
hashing "need not remain consistent from one execution of an application 
to another execution of the same application." [1], I'm wondering how 
many folks rely on consistent hashing, more than nobody I would fear. 
Lastly hashing large amounts of data per value seems an unlikely general 
use-case.


Cheers
/David Simms

[1] 
https://docs.oracle.com/javase/10/docs/api/java/lang/Object.html#hashCode()




More information about the valhalla-dev mailing list