Fwd: Better default for ParallelGCThreads and ConcGCThreads by using number of physical cores and CPU mask.
david.holmes at oracle.com
Mon Nov 25 00:24:13 PST 2013
On 25/11/2013 4:11 PM, Jon Masamitsu wrote:
> Thanks for taking a look at this.
> On 11/24/2013 6:19 PM, David Holmes wrote:
>> Hi Jon,
>> On 23/11/2013 3:24 AM, Jon Masamitsu wrote:
>>> This is a contribution regarding the number of GC worker threads to
>>> use. Part of the change queries /proc on linux to get the number of
>>> active cores on the platform. The changes are in
>>> Can someone familiar with this code take a look to see
>>> if it is reasonable and done in a way that is consistent
>>> with other /proc queries.
>> I can't comment on that specifically but I do have reservations about
>> this proposed patch.
>> First we have a general problem that "active processor count" doesn't
>> take into account the various resource management mechanisms that can
>> limit the actual "processors" available to the VM when it is running.
>> I would prefer to see that general problem solved. It also isn't clear
>> to me that the sched_getaffinity usage will correctly reflect the use
>> of tasksets/cpusets. (Note on solaris we try to handle some of these
>> mechanisms eg pbind and psrsets but still don't handle resource pools.)
> Is there any work being done on the general problem? I also would
> like to see this solved. I've always thought of it as runtime code that
> GC uses. Do you see it as a GC responsibility?
No this is a runtime issue.
>> Second, this feeds into future work on NUMA-awareness that will likely
>> need a more sophisticated set of API's.
> Can you explain more?
NUMA aware APIs need access to underlying machine topology so there will
have to be a VM interface that exposes the information in a suitable
way. This might involve information on sockets, cores,
"hyper-threading", processor id's etc.
>> Third I dislike that this is only really addressing linux-x86 and
>> leaving the other platforms to default to cores==processors. That just
>> causes unnecessary divergence in platform functionality.
> This is an interesting question with regard to open jdk contributions.
> Yes, Oracle should do its best to implement on all platforms but
> Google is a linux x86 shop and I personally don't expect them to implement
> and performance test on all the supported platforms. Should we be making
> that a requirement for open jdk contributions?
I can't say that we can make it a requirement from the original
contributor, but we should be advancing the platform not individual ports.
>> This is too late for JDK 8 and I think we will be doing more complete
>> work in this area during JDK 9 development.
> Agreed that it is too late for jdk8. I would think it would be suitable
> for an
> 8 update, however. What is coming in jdk9 that affects this?
Hopefully some NUMA-aware APIs :)
That aside this has to go into 9 before it can be considered for a
backport to 8u.
>>> -------- Original Message --------
>>> Subject: Better default for ParallelGCThreads and ConcGCThreads by
>>> using number of physical cores and CPU mask.
>>> Date: Tue, 19 Nov 2013 15:35:22 -0800
>>> From: Jungwoo Ha <jwha at google.com>
>>> To: hotspot-gc-dev at openjdk.java.net
>>> I am sending this webrev for the review.
>>> (On behalf of Jon Masamitsu, it is upload here)
>>> The feature is a new heuristics to calculate the default
>>> ParallelGCThreads and ConGCThreads.
>>> In x86, hyperthreading is generally bad for GC because of the cache
>>> Hence, using all the hyper-threaded cores will slow down the overall GC
>>> Current hotspot reads the number of processors that the Linux reports,
>>> which treats all hyper-threaded cores equally.
>>> Second problem is that when cpu mask is set, not all the cores are
>>> available for the GC.
>>> The patch improves the heuristics by evaluating the actual available
>>> physical cores
>>> from the proc filesystem and the CPU mask, and use that as the basis for
>>> calculating the ParallelGCThreads and ConcGCThreads.
>>> The improvements of GC pause time is significant. We evaluated on
>>> Nehalem, Westmere, Sandybridge as well as several AMD processors. We
>>> also evaluated on various CPU mask configuration and single/dual socket
>>> In almost all cases, there were speed up in GC pause time by 10~50%.
>>> We primarily use CMS collector for the evaluation, but we also tested on
>>> other GCs as well.
>>> Please take a look and let me know if this patch can be accepted.
>>> Jungwoo Ha
More information about the hotspot-runtime-dev