[10?] RFR: 8193128: Reduce number of implementation classes returned by List/Set/Map.of()
martinrb at google.com
Wed Dec 6 21:20:15 UTC 2017
Guava struggled with this as well with their immutable collections. You
could look at their revision history (I haven't). Maybe they got rid of
SingletonImmutableList for the same reason?
On Wed, Dec 6, 2017 at 12:21 PM, Claes Redestad <claes.redestad at oracle.com>
> please help review this patch to consolidate the number of implementation
> classes returned by the static collection factories:
> I set out to explore options for addressing small inefficiencies we've
> been running into, the latest after replacing use of @Stable arrays in
> java.lang.invoke with List.of() (JDK-8184777):
> - List.indexOf deferred to the iterator in AbstractList, which check for
> concurrent modification
> - Warmup takes a bit longer due to having to warm up four different
> classes and associated methods
> - Slowdowns in certain code paths attributable to certain calls becoming
> Microbenchmark: http://cr.openjdk.java.net/~re
> The benchmark explores how call-sites behave when performing some naive
> operations on a few different Lists.
> For every benchmark using List.of() there's a variant using ArrayList for
> Benchmark Mode Cnt Score Error Units
> ListMorphism.finalGetFromArrayList thrpt 25 92.659 ± 3.058 ops/us
> ListMorphism.finalGetFromList thrpt 25 335.245 ± 0.244 ops/us
> # 3.6x
> ListMorphism.finalSumSizesArrayList thrpt 25 245.020 ± 0.106 ops/us
> ListMorphism.finalSumSizesList thrpt 25 335.107 ± 0.439 ops/us
> # 1.4x
> ListMorphism.getFromArrayList thrpt 25 70.343 ± 0.972 ops/us
> ListMorphism.getFromList thrpt 25 37.121 ± 0.135 ops/us
> # 0.53x
> ListMorphism.getFromArrayList12 thrpt 25 109.890 ± 0.286 ops/us
> ListMorphism.getFromList12 thrpt 25 109.552 ± 6.972 ops/us
> # 1.0x
> ListMorphism.sumSizesArrayList thrpt 25 131.467 ± 4.672 ops/us
> ListMorphism.sumSizesList thrpt 25 57.877 ± 0.102 ops/us
> # 0.45x
> ListMorphism.sumSizesArrayList12 thrpt 25 208.652 ± 11.294 ops/us
> ListMorphism.sumSizesList12 thrpt 25 227.269 ± 0.961 ops/us
> # 1.1x
> The good: When dealing with List literals (the final* benchmarks),
> List.of() allows really nice speed-ups compared to ArrayList.
> The bad: When not used as a literal, List.of() implementations at
> call-sites can cause a substantial slowdown (megamorphic dispatch)
> The ugly:
> After some explorations, I narrowed in on the following experiment:
> Basically: Merge List1 and List2 into a single class, List12, merge List0
> into ListN (List0 has a singleton instance, so might as well be an instance
> of ListN). Same for Set0,1,2,N. Map0 is merged into MapN.
> This strikes a balance between throughput, footprint and slightly better
> startup/warmup behavior.
> According to jol estimates the size of List12 is the same as both List1
> and List2 in the current JDK implementation. Set12 is footprint neutral
> compared to Set2 on all platforms but lose a few bytes on 64-bit VMs
> compared to Set1.
> Benchmark Mode Cnt Score Error Units
> ListMorphism.finalGetFromArrayList thrpt 25 93.046 ± 0.569 ops/us
> ListMorphism.finalGetFromList thrpt 25 335.280 ± 0.154 ops/us #
> ListMorphism.finalSumSizesArrayList thrpt 25 244.595 ± 1.085 ops/us
> ListMorphism.finalSumSizesList thrpt 25 303.351 ± 0.182 ops/us #
> ListMorphism.getFromArrayList thrpt 25 70.631 ± 0.070 ops/us
> ListMorphism.getFromList thrpt 25 73.258 ± 2.955 ops/us #
> ListMorphism.getFromArrayList12 thrpt 25 109.921 ± 0.096 ops/us
> ListMorphism.getFromList12 thrpt 25 127.392 ± 0.088 ops/us
> # 1.16x
> ListMorphism.sumSizesArrayList thrpt 25 131.393 ± 4.882 ops/us
> ListMorphism.sumSizesList thrpt 25 107.686 ± 5.286 ops/us #
> ListMorphism.sumSizesArrayList12 thrpt 25 212.350 ± 0.134 ops/us
> ListMorphism.sumSizesList12 thrpt 25 198.778 ± 0.479 ops/us
> # 0.94x
> The experiment has a flag to change number of specialized List/Set/Map
> classes (-Djdk.ImmutableCollections.specializations=0|1|2, default=2).
> 1 specialization (List1 + ListN, Set1 + SetN) is more or less the same as
> 2, some ~1-2% improvements, mainly in sumSizes micros.
> 0 specializations (List-, Set, MapN only) achieves a small increase in
> throughput on some micros by ensuring callsites stay monomorphic, but it's
> not very substantial by my measures (~5%, but mostly in sumSizes micros).
> Keeping the footprint more or less the same, while evening out a few rough
> edges and improving startup and static footprint seems like the overall
> best option. An alternative would be to keep the experimental flag, but I
> don't think a 5% gain on micros warrants the extra maintenance burden and
> testing that entails.
> The proposed patch is more or less this experiment with 2 specializations,
> but having removed the flag and code movement needed to support it removed
> (along with a fix in the writeReplace methods of List12/Set12)
>  Older experiments:
> -- simply merge L0 into LN (still have L1, L2 and LN) - nothing really
> changed, though
> -- L0 merged into LN, drop L2. Substantial performance boost on micros.
> Footprint drop for 2-element lists.
> -- L0+L1+L2+LN merged into one implementation. Same footprint with a
> single class, but loses a lot on throughput in micros.
> -- L0+L1+LN merged, drop L2. Simplification of the previous. Like the
> list1N.0 experiment in footprint, but a loss in throughput on all measures.
> No approach seemed a win across the board here: we either had to accept a
> footprint reduction, or see throughput suffer dramatically.
>  http://openjdk.java.net/projects/code-tools/jol/
More information about the core-libs-dev