RFR 8220613: [TEST] java/util/Arrays/TimSortStackSize2.java times out with fastdebug build
Roger.Riggs at oracle.com
Mon Mar 18 20:08:51 UTC 2019
Thanks for the suggestions.
The biggest difference seems to come from the -XX:-UseCompressedOops flag
both release and fastdebug builds.
The purpose of the test was to check that the stack size required is not
excessive. So what I think I need to answer is whether
it is beneficial to run this test and under what conditions.
The sort algorithm itself is not going change since the data to be
sorted is generated.
So the only variable is how much stack is used for each call.
And is it a beneficial use of 2.5 minutes of test time?
On 03/18/2019 01:12 PM, Aleksey Shipilev wrote:
> On 3/14/19 4:23 PM, Roger Riggs wrote:
>> I didn't have a good idea where to look, the times do seem excessive.
> You do it with profilers. Since fastdebug probably has the bottleneck in the JVM code, you need a
> native profiler. On Linux, you do this:
> $ CONF=linux-x86_64-server-fastdebug perf record -g make images run-test
> ...and then open "perf report" and meditate. Sometimes it is easier to produce the high-level
> flamegraph, for instance with https://github.com/KDAB/hotspot:
> What can you see here? G1 ConcurrentRefineThread spends a lot of time verifying stuff, as it would
> in fastdebug builds. This is one of the major contributors to this difference:
> release timing:
> real 0m12.485s
> user 0m40.930s
> sys 0m3.840s
> fastdebug timing:
> real 0m32.030s
> user 1m58.519s
> sys 0m5.172s
> So, there is 3-4x difference. It is way off the stated in original problem:
> Release images build: 4 seconds
> Fastdebug images build: 2.5 minutes
> Anyway, if you apply this:
> diff -r 98e21d4da074 test/jdk/java/util/Arrays/TimSortStackSize2.java
> --- a/test/jdk/java/util/Arrays/TimSortStackSize2.java Mon Mar 18 15:21:33 2019 +0100
> +++ b/test/jdk/java/util/Arrays/TimSortStackSize2.java Mon Mar 18 17:52:09 2019 +0100
> @@ -71,4 +71,5 @@
> OutputAnalyzer output = ProcessTools.executeTestJava(xmsValue,
> + "-XX:+UseParallelGC",
> Then timings become:
> real 0m16.004s
> user 0m41.382s
> sys 0m4.660s
> real 0m17.292s
> user 1m8.225s
> sys 0m4.068s
> You repeat the profiling step to discover C2 becomes hot. Falling back to C1 would not help
> fastdebug timing, though, because less optimized code is not offsetting the better compiler performance.
More information about the core-libs-dev