RFR JDK-8011200 (was 7143928) : (coll) Optimize for Empty ArrayList and HashMap

Mike Duigou mike.duigou at oracle.com
Wed Apr 10 18:12:50 UTC 2013

On Apr 9 2013, at 19:56 , Martin Buchholz wrote:

> Mike, thanks.
> I don't see anything wrong in this version, although the ongoing complexification and special-case-ification (with attendant risk of bugs) of ArrayList and HashMap, the two most didactically important classes, continue to bother me.  This change continues to feel marginal.  

It was surprising to find just how many ArrayList and HashMap instances were empty and/or never used. Unfortunately it's harder to globally fix applications than it is to and special case code to ArrayList and HashMap.

> Anyways, this is OK with me if it's OK with Alan.
> The original code that shifts by one every time has a certain elegance, and because it's rare to need more than one doubling, is also hard to beat performance-wise.
>  546             while (newCapacity < targetCapacity)
>  547                 newCapacity <<= 1;
I will restore it.

> There are now so many "if (isEmpty())" checks that I wonder again whether it would be better to use a null for the empty table, since null checks are closer to free.

The use of an empty array rather than null was suggested by John Rose who said:

> I recommend an empty array rather than null as a sentinel value for two reasons:
> 1. The JVM prefers to merge null checks into load or store instructions (so-called "implicit null checks") because it removes an explicit branch. But it only does so if the probability of nulls is zero or very low. But using null as a sentinel for common states (e.g., empty collection) defeats this optimization.
> 2. For power-of-two sized structures (HashMap) we can optimize away an array range check in the presence of a zero-length check. 
> Since most uses of a variable-sized collection load and test the array length, the sentinel check can easily be overloaded onto this test. If null is not used, then the (safety-mandated) null check is (usually) merged into the load of the length. If the table is power-of-two-sized, then only the zero check remains, and the array range check may be removed. This is thought to be the best code for a frequent load from a possibly-empty collection.
> Mike asked, "what about empty collection?" This is a reasonable thing to use, but it has a cost. The JVM uses inline caches and type profiles to simplify its optimized code; these techniques "win" when at a given use point (individual call to Map.get, for example) there is only one class present, even if the interface is totally general. (This is the so-called "monomorphic" case.) If the application uses (say) only HashMaps for both empty and non-empty maps, then this optimization can win big. It can be broken, on the other hand, if the application begins to use one other type for some other case (such as empty maps). In these cases, it is better to overload the "am I empty?" test on some other loaded value, such as a null or (better) an array length. 


More information about the core-libs-dev mailing list