Parallel decompositions, C/P/N/Q experiment, take 3

Aleksey Shipilev aleksey.shipilev at
Fri Oct 5 05:37:08 PDT 2012


The results for decomposition benchmark on 2x8x2 Xeon E5-2680
(SandyBridge) running Solaris 11, and 20120925 lambda nightly with -d64
-XX:-TieredCompilation -XX:+UseParallelOldGC -XX:+UseNUMA
-XX:-UseBiasedLocking -XX:+UseCondCardMark, is here [1].

This time it was generally focused on C=1, P=32 plane. It might be more
convenient to look for the high-level charts [2]. Doug, we can infer the
break-even front (aka par/seq = 1 isoline) from there. Raw data is here [3].

Some observations:
 - break-even front seem to fit N*Q = 2*10^5 in high-Q/low-N part, and
5*10^5 for low-Q/high-N case; that means for very light operation of
just a few arithmetic operations, we need to have at least 10^5 elements
in stream to justify going for parallel version.
 - usr% is predictably low for N < P
 - usr% is lower for lower Q, given the same N; this might highlight the
problem with parallel decomposition, and may explain high break-even
 - lower usr% is accompanied with larger sys% and ctxsw ratio



More information about the lambda-dev mailing list