Request for tracking down C1 optimizations: handwritten cartesian product similar to flatmap/map performance!
vladimir.kozlov at oracle.com
Fri May 30 01:06:49 UTC 2014
Escape Analysis may not help you because main time goes to the loop
which only reads from arrays and does math. There is no allocations in
the loop. If stream version produces the same loop shape, the
performance will be the same. We have -XX:+TraceLoopOpts flag but it is
available only in debug version of JVM.
On 5/29/14 11:03 AM, Andrew Haley wrote:
> On 05/29/2014 06:55 PM, Aggelos Biboudis wrote:
>> I would like to ask you something regarding C1 compilation (VM options:
>> -Xms769m -Xmx769m -XX:-TieredCompilation)
> That's C2 compilation.
>> of a Cartesian product stream
>> operation with the new stream API.
>> I have two versions of this computation, one handwritten and one with
>> flatmap/map. It is remarkable that these two have similar performance so I
>> would like to trace-back the JIT compilation decisions (apart from
>> inlining), and more specifically if escape analysis has any effect.
> Are you quite sure your numbers aren't dominated by cache misses? Your
> data is about 40 Megabytes and it's being accessed sequentially.
>> I've tested the code above with -XX:-DoEscapeAnalysis and I've got the same
>> execution times, however I would like to confirm what happens.
>> Regarding inlining, only by noticing the result of PrintInlining we
>> conclude that cartSeq inlines all the nested forEachRemaining operations
>> (of of, flatmap, map), but is that the only optimization?
> Not if this really is C2, no. There are many optimization passes,
> and several will be effective for this code.
More information about the hotspot-compiler-dev