RFR: JDK-8211727: Adjust default concurrency settings for running tests on Sparc
erik.joelsson at oracle.com
Wed Nov 14 17:05:42 UTC 2018
On 2018-11-13 20:03, David Holmes wrote:
> Hi Erik,
> Thanks for all the work you did in trying to stabilize this.
> One comment ...
> On 14/11/2018 7:34 am, Erik Joelsson wrote:
>> This patch changes the formula for default test concurrency in
>> RunTest.gmk. The current formula is:
>> min(cpus/2, 12)
>> This seems to work well enough on the x64 machines we currently run
>> our tests on, but less so for Sparc. I have now run rather extensive
>> testing in our lab and have come up with a new formula that provides
>> much better test reliability while preserving as much test throughput
>> as possible. The new formula is cpus/4 for sparcs with up to 16 cpus
>> and cpus/5 for larger machines. For non Sparc it's still cpus/2 and
>> I've removed the cap for all.
> I'm surprised that you removed the cap and that it is okay. IIRC we
> had problems with large #CPU machines but only medium amounts of RAM.
> Too high a concurrency level would result in memory exhaustion.
Dan brought this up too in chat when I first suggested it. I looked
through all available machines in Mach5. There are 2 non SPARC that have
enough CPUs to be affected by the cap and they have 256GB of RAM, which
is plenty. The rest are SPARC and they have at least 1GB of RAM per CPU,
usually more. So with the proposed scheme, I can't see anything really
changing with regards to JOBS vs RAM. My testing did not reveal any such
problems when I scaled down concurrency enough. Also note that the
biggest SPARC we have has 64 CPUs which will now translate into 13 jobs,
which is just 1 more than the previous cap.
We do have a separate issue with a few macs with low RAM compared to
CPUS (4GB and 8CPUs) and I intend to attack that next. My plan is
basically to do something similar to what configure is doing for build
jobs (which is JOBS=min(cpus, RAM in GB)). The exact formula to be
determined. I suspect it's going to involve a constant for the RAM part
to make room for the test harness so something like (RAM - k)/x.
>> In addition to this, since Sparc generally have lower per thread
>> performance, at least when running JDK tests, I have bumped the
>> default timeout factor from 4 to 8 for Sparc.
>> With these defaults, we were able to remove a lot of special cases
>> for Sparc in other parts of our configurations and I was able to get
>> clean runs of all the lower tiers of testing, on each of our machine
>> classes in the lab.
>> In addition to this, the test
>> compiler/jsr292/ContinuousCallSiteTargetChange.java, which had its
>> timeout increased in JDK-8212028, no longer needs an increased
>> timeout with the new defaults.
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8211727
>> Webrev: http://cr.openjdk.java.net/~erikj/8211727/webrev.01/
More information about the build-dev