RFR(S): 8198756: Limit number of compiler threads for small code cache
martin.doerr at sap.com
Thu Mar 1 08:31:52 UTC 2018
we observed that the compiler threads fill up the code cache faster than the sweeper can clean when using a small code cache.
This doesn't seem beneficial at all.
Some customers try to save memory by using a very small code cache. It's very annoying that so much memory gets wasted for such a large number of idle compiler threads which hold their arenas etc.
Maybe the current formula was optimized for a special scenario with many slow cores? Maybe SPARC Niagara?
Shouldn't such scenarios use a large code cache? Maybe much more than 240MB?
From: Igor Veresov [mailto:igor.veresov at oracle.com]
Sent: Donnerstag, 1. März 2018 08:05
To: Vladimir Kozlov <vladimir.kozlov at oracle.com>
Cc: Doerr, Martin <martin.doerr at sap.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(S): 8198756: Limit number of compiler threads for small code cache
I’m curious about the rationale for tying the number of thread to the size of the code cache. Is it because you don’t want them to keep holding the space for their code buffers when they are idle?
On Feb 27, 2018, at 10:19 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com<mailto:vladimir.kozlov at oracle.com>> wrote:
The problem with your proposal is that we don't use scale number of compiler threads when we have a lot of cpus (>1000 on big "slow" machines).
By default for tiered compilation we have 240Mb for CodeCache. With your formula we always will have 7 threads (2 C1 and 5 C2) which could be fine if machine has < total 32 procs/threads. But for big machines it may be bottleneck for JIT compilation intensive applications (and for startup when most JIT compilations happened).
Main motivation of current approach was to reach peak performance (c2 compilations) as fast as possible. What we usually observed before is large compilation queue for C2 compilation because slow throughput of C2. It was especially visible with tiered compilation when compilation thresholds reached faster with first tier compiled profiling code.
And I agree that we may have problem with number of compiler threads at the beginning of graph (< 32 cpu threads) when the number grows too fast:
Graph for 3*log2(x)*log2(log2(x))/2
-60-55-50-45-40-35-30-25-20-15-10-55101520253035404550556065707580859095100105110115120125130-35-30-25-20-15-10-55101520253035404550556065x: 32.0711217y: 17.4325495
May be we should have a formula which takes into account code cache size and number of cpu threads.
Igor Veresov was original developer of current formula. It would be nice to hear his opinion.
On 2/27/18 8:10 AM, Doerr, Martin wrote:
the VM currently starts a large amount of compiler threads on systems with many CPUs regardless of the code cache size.
This doesn't make sense for very small code cache sizes.
The dynamically determined number of compiler threads can be observed by:
jdk/bin/java -XX:ReservedCodeCacheSize=128m -XX:+PrintFlagsFinal -version|grep CICompiler
I suggest not to use more than 1 compiler thread per 32MB of code cache:
This seems to be conservative.
Please review and let me know if you have a different limitation proposal.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the hotspot-compiler-dev