run-to-run variance on C/P/N/Q experiments
aleksey.shipilev at oracle.com
Tue Oct 9 03:41:56 PDT 2012
Yes, turned off.
In fact, the configuration is the same as , namely:
decomposition benchmark on 2x8x2 Xeon E5-2680 (SandyBridge) running
Solaris 11, and 20120925 lambda nightly with -d64 -XX:-TieredCompilation
-XX:+UseParallelOldGC -XX:+UseNUMA -XX:-UseBiasedLocking
And the difference is sustainable throughout the run (even though that
could be explained with C1 getting different profiles in tiered mode,
which is not enabled in this particular case).
On 10/09/2012 02:34 PM, Remi Forax wrote:
> is it with tiered compilation enable or not ?
> I've found that tiered compilation introduces more jitter than when the
> VM is configured to only c2.
> On 10/09/2012 11:18 AM, Aleksey Shipilev wrote:
>> I'm following up on the decomposition experiments, and this time focus
>> on run to run variance for these. I've took one of the break-even points
>> of the previous experiment on the same machine , and executed it
>> multiple times.
>> For C=1, P=32, N=3000, Q=20 in parallel case, we run the tests in two modes:
>> a. 10 iterations per JVM invocation, 1000 JVM runs 
>> b. 100 iterations per JVM invocation, 10 JVM runs 
>> The bottom line for this experiment is that we experience a huge
>> run-to-run variance, that are be triaged to be JITting jitter:
>> - scores drift from run to run, staying within the bounds in the run
>> - -Xint mitigates the variance (with a huge penalty in scores)
>> - -Xcomp -Xbatch mitigates the variance (but drops the scores)
>> That also means that our break-even experiments are somewhat 30-50% off
>> the true value. There is no reasonable way found to lower the run-to-run
>> variance without the performance penalty, so we only option left at this
>> point is run with multiple invocations.
>> The disassembly dumps caught for low-score and high-score are here .
>> The integer there is the throughput we have on that code. If someone
>> could make sense of those logs alone, you are welcome to do so. The
>> entry point for microbenchmark is "testParallel" method. The inline
>> trees are somewhat different, but not that different to readily explain
>> the performance difference.
>>  http://shipilev.net/pub/jdk/lambda/runtorun-variance/i10-f1000/
>>  http://shipilev.net/pub/jdk/lambda/runtorun-variance/i100-f10/
>>  http://shipilev.net/pub/jdk/lambda/runtorun-variance/i10-f1000/asms/
More information about the lambda-dev