[10?] RFR: 8193128: Reduce number of implementation classes returned by List/Set/Map.of()

Claes Redestad claes.redestad at oracle.com
Wed Dec 6 20:21:55 UTC 2017


please help review this patch to consolidate the number of 
implementation classes returned by the static collection factories:


I set out to explore options for addressing small inefficiencies we've 
been running into, the latest after replacing use of @Stable arrays in 
java.lang.invoke with List.of() (JDK-8184777):

- List.indexOf deferred to the iterator in AbstractList, which check for 
concurrent modification
- Warmup takes a bit longer due to having to warm up four different 
classes and associated methods
- Slowdowns in certain code paths attributable to certain calls becoming 


The benchmark explores how call-sites behave when performing some naive 
operations on a few different Lists.

For every benchmark using List.of() there's a variant using ArrayList 
for comparison:


Benchmark                             Mode  Cnt    Score Error   Units
ListMorphism.finalGetFromArrayList   thrpt   25   92.659 ±  3.058 ops/us
ListMorphism.finalGetFromList        thrpt   25  335.245 ±  0.244 ops/us 
# 3.6x
ListMorphism.finalSumSizesArrayList  thrpt   25  245.020 ±  0.106 ops/us
ListMorphism.finalSumSizesList       thrpt   25  335.107 ±  0.439 ops/us 
# 1.4x
ListMorphism.getFromArrayList        thrpt   25   70.343 ±  0.972 ops/us
ListMorphism.getFromList             thrpt   25   37.121 ±  0.135 ops/us 
# 0.53x
ListMorphism.getFromArrayList12      thrpt   25 109.890 ±  0.286  ops/us
ListMorphism.getFromList12           thrpt   25  109.552 ± 6.972  ops/us 
# 1.0x
ListMorphism.sumSizesArrayList       thrpt   25  131.467 ±  4.672 ops/us
ListMorphism.sumSizesList            thrpt   25   57.877 ±  0.102 ops/us 
# 0.45x
ListMorphism.sumSizesArrayList12     thrpt   25 208.652 ± 11.294  ops/us
ListMorphism.sumSizesList12          thrpt   25  227.269 ± 0.961  ops/us 
# 1.1x

The good: When dealing with List literals (the final* benchmarks), 
List.of() allows really nice speed-ups compared to ArrayList.

The bad: When not used as a literal, List.of() implementations at 
call-sites can cause a substantial slowdown (megamorphic dispatch)

The ugly:

After some explorations[1], I narrowed in on the following experiment: 

Basically: Merge List1 and List2 into a single class, List12, merge 
List0 into ListN (List0 has a singleton instance, so might as well be an 
instance of ListN). Same for Set0,1,2,N. Map0 is merged into MapN.

This strikes a balance between throughput, footprint and slightly better 
startup/warmup behavior.

According to jol estimates[2] the size of List12 is the same as both 
List1 and List2 in the current JDK implementation. Set12 is footprint 
neutral compared to Set2 on all platforms but lose a few bytes on 64-bit 
VMs compared to Set1.

Benchmark                             Mode  Cnt    Score   Error Units
ListMorphism.finalGetFromArrayList   thrpt   25   93.046 ± 0.569 ops/us
ListMorphism.finalGetFromList        thrpt   25  335.280 ± 0.154 ops/us 
# 3.6x
ListMorphism.finalSumSizesArrayList  thrpt   25  244.595 ± 1.085 ops/us
ListMorphism.finalSumSizesList       thrpt   25  303.351 ± 0.182 ops/us 
# 1.2x
ListMorphism.getFromArrayList        thrpt   25   70.631 ± 0.070 ops/us
ListMorphism.getFromList             thrpt   25   73.258 ± 2.955 ops/us 
# 1.04x
ListMorphism.getFromArrayList12      thrpt   25 109.921 ± 0.096  ops/us
ListMorphism.getFromList12           thrpt   25  127.392 ± 0.088  ops/us 
# 1.16x
ListMorphism.sumSizesArrayList       thrpt   25  131.393 ± 4.882 ops/us
ListMorphism.sumSizesList            thrpt   25  107.686 ± 5.286 ops/us 
# 0.82x
ListMorphism.sumSizesArrayList12     thrpt   25  212.350 ± 0.134  ops/us
ListMorphism.sumSizesList12          thrpt   25  198.778 ± 0.479  ops/us 
# 0.94x

The experiment has a flag to change number of specialized List/Set/Map 
classes (-Djdk.ImmutableCollections.specializations=0|1|2, default=2).

1 specialization (List1 + ListN, Set1 + SetN) is more or less the same 
as 2, some ~1-2% improvements, mainly in sumSizes micros.

0 specializations (List-, Set, MapN only) achieves a small increase in 
throughput on some micros by ensuring callsites stay monomorphic, but 
it's not very substantial by my measures (~5%, but mostly in sumSizes 

Keeping the footprint more or less the same, while evening out a few 
rough edges and improving startup and static footprint seems like the 
overall best option. An alternative would be to keep the experimental 
flag, but I don't think a 5% gain on micros warrants the extra 
maintenance burden and testing that entails.

The proposed patch is more or less this experiment with 2 
specializations, but having removed the flag and code movement needed to 
support it removed (along with a fix in the writeReplace methods of 



[1] Older experiments:
  -- simply merge L0 into LN (still have L1, L2 and LN) - nothing really 
changed, though

  -- L0 merged into LN, drop L2. Substantial performance boost on 
micros. Footprint drop for 2-element lists.

  -- L0+L1+L2+LN merged into one implementation. Same footprint with a 
single class, but loses a lot on throughput in micros.

  -- L0+L1+LN merged, drop L2. Simplification of the previous. Like the 
list1N.0 experiment in footprint, but a loss in throughput on all measures.

No approach seemed a win across the board here: we either had to accept 
a footprint reduction, or see throughput suffer dramatically.

[2] http://openjdk.java.net/projects/code-tools/jol/

More information about the core-libs-dev mailing list