RFR (M) 8059461: Refactor IndexSet for better performance (preliminary)
aleksey.shipilev at oracle.com
Wed Nov 5 21:26:36 UTC 2014
Long story short: current implementation of IndexSet, while smart, is
too smart for its own good. Trying to be sparse, it loses locality.
Trying to be smart about bit tricks, it loses the native word length.
Because of that, sophisticated IndexSet does not yield a desired
performance benefit on compilation-heavy workloads like Nashorn.
Delegating the work to already existing BitMap both conserves the source
code, and brings more performance. (C1 also uses the BitMap adapter like
that for ValueMap-s).
IndexSet is a major data structure for representing IFG in C2 Regalloc,
and that is why improvements in IndexSet translate to faster register
allocation, and faster C2 compiles. If you gut the IndexSet internals,
and replace it with BitMap, the sample performance runs on Nashorn
running Octane suite yield reliable improvements in compilation speed
(average: 22.7 Kb/s -> 24.4 Kb/s), explained by the decrease in C2
regalloc time (average: 155s -> 132s).
These improvements are in line with predicted improvements from a trial
experiment of "coarsening" the IndexSet, see the relevant RFE:
In other words, we can have a performance-improving change which also
removes lots of code. Or, a cleanup change, which also improves
performance. Here it is:
The patch is mostly proof-of-concept, and not ready for commit. Please
let me know what you think about the approach. Code/style/idea
suggestions are welcome.
Brief summary of changes:
- Almost all contents of IndexSet are purged, and delegated to BitMap
- IndexSetIterator performance is important, and therefore the lookup
table approach from IndexSetIterator was transplanted to new
BitMapIterator. We might want to improve BitMap::get_next_one_offset
with lookup tables as well, but that will contaminate the current
- lrg_union was moved to appropriate place (why was it in IndexSet to
- some of IndexSet memory management / tracing functions were purged
The testing so far was very light:
- smoke tests with JPRT
- Nashorn/Octane benchmarks
More information about the hotspot-compiler-dev