Parallel decompositions, C/P/N/Q experiment, take 3
aleksey.shipilev at oracle.com
Fri Oct 5 05:37:08 PDT 2012
The results for decomposition benchmark on 2x8x2 Xeon E5-2680
(SandyBridge) running Solaris 11, and 20120925 lambda nightly with -d64
-XX:-TieredCompilation -XX:+UseParallelOldGC -XX:+UseNUMA
-XX:-UseBiasedLocking -XX:+UseCondCardMark, is here .
This time it was generally focused on C=1, P=32 plane. It might be more
convenient to look for the high-level charts . Doug, we can infer the
break-even front (aka par/seq = 1 isoline) from there. Raw data is here .
- break-even front seem to fit N*Q = 2*10^5 in high-Q/low-N part, and
5*10^5 for low-Q/high-N case; that means for very light operation of
just a few arithmetic operations, we need to have at least 10^5 elements
in stream to justify going for parallel version.
- usr% is predictably low for N < P
- usr% is lower for lower Q, given the same N; this might highlight the
problem with parallel decomposition, and may explain high break-even
- lower usr% is accompanied with larger sys% and ctxsw ratio
More information about the lambda-dev