RFR (M) 8059461: Refactor IndexSet for better performance (preliminary)
aleksey.shipilev at oracle.com
Tue Nov 11 19:21:39 UTC 2014
Thanks for taking a look, Albert!
Of course, the performance runs for other workloads and platforms are
underway. I am not expecting that dramatic boosts though, since Nashorn
is one of the few compilation-heavy workloads in our belt.
On 11/06/2014 12:36 PM, Albert Noll wrote:
> It would be good to have data from different applications / architectures.
> Does anyone know more about the motivation for IndexSet, i.e., why it
> was introduced in the first place?
> On 11/05/2014 11:58 PM, Vladimir Kozlov wrote:
>> I think it is nice cleanup with performance benefits :)
>> Aleksey, can you compare average memory consumed by IndexSet before
>> and after?
>> Why you need initialize_in_resource_arena()? by default BitMap() uses
>> resource area:
>> BitMap(idx_t size_in_bits, bool in_resource_area = true);
>> Make lrg_union() PhaseConservativeCoalesce class's method.
>> On 11/5/14 1:26 PM, Aleksey Shipilev wrote:
>>> Long story short: current implementation of IndexSet, while smart, is
>>> too smart for its own good. Trying to be sparse, it loses locality.
>>> Trying to be smart about bit tricks, it loses the native word length.
>>> Because of that, sophisticated IndexSet does not yield a desired
>>> performance benefit on compilation-heavy workloads like Nashorn.
>>> Delegating the work to already existing BitMap both conserves the source
>>> code, and brings more performance. (C1 also uses the BitMap adapter like
>>> that for ValueMap-s).
>>> IndexSet is a major data structure for representing IFG in C2 Regalloc,
>>> and that is why improvements in IndexSet translate to faster register
>>> allocation, and faster C2 compiles. If you gut the IndexSet internals,
>>> and replace it with BitMap, the sample performance runs on Nashorn
>>> running Octane suite yield reliable improvements in compilation speed
>>> (average: 22.7 Kb/s -> 24.4 Kb/s), explained by the decrease in C2
>>> regalloc time (average: 155s -> 132s).
>>> These improvements are in line with predicted improvements from a trial
>>> experiment of "coarsening" the IndexSet, see the relevant RFE:
>>> In other words, we can have a performance-improving change which also
>>> removes lots of code. Or, a cleanup change, which also improves
>>> performance. Here it is:
>>> The patch is mostly proof-of-concept, and not ready for commit. Please
>>> let me know what you think about the approach. Code/style/idea
>>> suggestions are welcome.
>>> Brief summary of changes:
>>> - Almost all contents of IndexSet are purged, and delegated to BitMap
>>> - IndexSetIterator performance is important, and therefore the lookup
>>> table approach from IndexSetIterator was transplanted to new
>>> BitMapIterator. We might want to improve BitMap::get_next_one_offset
>>> with lookup tables as well, but that will contaminate the current
>>> - lrg_union was moved to appropriate place (why was it in IndexSet to
>>> begin with?)
>>> - some of IndexSet memory management / tracing functions were purged
>>> The testing so far was very light:
>>> - smoke tests with JPRT
>>> - Nashorn/Octane benchmarks
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 819 bytes
Desc: OpenPGP digital signature
More information about the hotspot-compiler-dev