RFR: JDK-8211727: Adjust default concurrency settings for running tests on Sparc
david.holmes at oracle.com
Wed Nov 14 22:40:19 UTC 2018
Thanks for the further info Erik. I guess we will see how this plays out
On 15/11/2018 3:05 am, Erik Joelsson wrote:
> On 2018-11-13 20:03, David Holmes wrote:
>> Hi Erik,
>> Thanks for all the work you did in trying to stabilize this.
>> One comment ...
>> On 14/11/2018 7:34 am, Erik Joelsson wrote:
>>> This patch changes the formula for default test concurrency in
>>> RunTest.gmk. The current formula is:
>>> min(cpus/2, 12)
>>> This seems to work well enough on the x64 machines we currently run
>>> our tests on, but less so for Sparc. I have now run rather extensive
>>> testing in our lab and have come up with a new formula that provides
>>> much better test reliability while preserving as much test throughput
>>> as possible. The new formula is cpus/4 for sparcs with up to 16 cpus
>>> and cpus/5 for larger machines. For non Sparc it's still cpus/2 and
>>> I've removed the cap for all.
>> I'm surprised that you removed the cap and that it is okay. IIRC we
>> had problems with large #CPU machines but only medium amounts of RAM.
>> Too high a concurrency level would result in memory exhaustion.
> Dan brought this up too in chat when I first suggested it. I looked
> through all available machines in Mach5. There are 2 non SPARC that have
> enough CPUs to be affected by the cap and they have 256GB of RAM, which
> is plenty. The rest are SPARC and they have at least 1GB of RAM per CPU,
> usually more. So with the proposed scheme, I can't see anything really
> changing with regards to JOBS vs RAM. My testing did not reveal any such
> problems when I scaled down concurrency enough. Also note that the
> biggest SPARC we have has 64 CPUs which will now translate into 13 jobs,
> which is just 1 more than the previous cap.
> We do have a separate issue with a few macs with low RAM compared to
> CPUS (4GB and 8CPUs) and I intend to attack that next. My plan is
> basically to do something similar to what configure is doing for build
> jobs (which is JOBS=min(cpus, RAM in GB)). The exact formula to be
> determined. I suspect it's going to involve a constant for the RAM part
> to make room for the test harness so something like (RAM - k)/x.
>>> In addition to this, since Sparc generally have lower per thread
>>> performance, at least when running JDK tests, I have bumped the
>>> default timeout factor from 4 to 8 for Sparc.
>>> With these defaults, we were able to remove a lot of special cases
>>> for Sparc in other parts of our configurations and I was able to get
>>> clean runs of all the lower tiers of testing, on each of our machine
>>> classes in the lab.
>>> In addition to this, the test
>>> compiler/jsr292/ContinuousCallSiteTargetChange.java, which had its
>>> timeout increased in JDK-8212028, no longer needs an increased
>>> timeout with the new defaults.
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8211727
>>> Webrev: http://cr.openjdk.java.net/~erikj/8211727/webrev.01/
More information about the build-dev