Fwd: Better default for ParallelGCThreads and ConcGCThreads by using number of physical cores and CPU mask.
jon.masamitsu at oracle.com
Mon Nov 25 16:47:24 UTC 2013
Looping in hotspot-gc-dev to the discussion.
David Holmes raised the question of this enhancement only
being implemented (in effect) and tested on linux systems.
The contributor works in a linux shop. What should we
require for an enhancement? At Oracle we try hard to
implement uniformly on all platforms. No we don't always
succeed but we have it on our minds. We have a potential
contribution that has shown benefit on a wide variety of
applications in linux environment so we are considering it.
There is no guarantee that the GC team here has
the time to fill out the implementation on other platforms.
I would hate to think that that would block a good
enhancement but, as I said, we like to have uniform support
on all platforms if possible.
I really don't know where this is going. I know we're
not going to make all of the people happy all of the time,
but if you want to say something about this, please do.
On 11/24/13 10:11 PM, Jon Masamitsu wrote:
> Thanks for taking a look at this.
> On 11/24/2013 6:19 PM, David Holmes wrote:
>> Hi Jon,
>> On 23/11/2013 3:24 AM, Jon Masamitsu wrote:
>>> This is a contribution regarding the number of GC worker threads to
>>> use. Part of the change queries /proc on linux to get the number of
>>> active cores on the platform. The changes are in
>>> Can someone familiar with this code take a look to see
>>> if it is reasonable and done in a way that is consistent
>>> with other /proc queries.
>> I can't comment on that specifically but I do have reservations about
>> this proposed patch.
>> First we have a general problem that "active processor count" doesn't
>> take into account the various resource management mechanisms that can
>> limit the actual "processors" available to the VM when it is running.
>> I would prefer to see that general problem solved. It also isn't
>> clear to me that the sched_getaffinity usage will correctly reflect
>> the use of tasksets/cpusets. (Note on solaris we try to handle some
>> of these mechanisms eg pbind and psrsets but still don't handle
>> resource pools.)
> Is there any work being done on the general problem? I also would
> like to see this solved. I've always thought of it as runtime code that
> GC uses. Do you see it as a GC responsibility?
>> Second, this feeds into future work on NUMA-awareness that will
>> likely need a more sophisticated set of API's.
> Can you explain more?
>> Third I dislike that this is only really addressing linux-x86 and
>> leaving the other platforms to default to cores==processors. That
>> just causes unnecessary divergence in platform functionality.
> This is an interesting question with regard to open jdk contributions.
> Yes, Oracle should do its best to implement on all platforms but
> Google is a linux x86 shop and I personally don't expect them to
> and performance test on all the supported platforms. Should we be
> that a requirement for open jdk contributions?
>> This is too late for JDK 8 and I think we will be doing more complete
>> work in this area during JDK 9 development.
> Agreed that it is too late for jdk8. I would think it would be
> suitable for an
> 8 update, however. What is coming in jdk9 that affects this?
>>> -------- Original Message --------
>>> Subject: Better default for ParallelGCThreads and ConcGCThreads by
>>> using number of physical cores and CPU mask.
>>> Date: Tue, 19 Nov 2013 15:35:22 -0800
>>> From: Jungwoo Ha <jwha at google.com>
>>> To: hotspot-gc-dev at openjdk.java.net
>>> I am sending this webrev for the review.
>>> (On behalf of Jon Masamitsu, it is upload here)
>>> The feature is a new heuristics to calculate the default
>>> ParallelGCThreads and ConGCThreads.
>>> In x86, hyperthreading is generally bad for GC because of the cache
>>> Hence, using all the hyper-threaded cores will slow down the overall GC
>>> Current hotspot reads the number of processors that the Linux reports,
>>> which treats all hyper-threaded cores equally.
>>> Second problem is that when cpu mask is set, not all the cores are
>>> available for the GC.
>>> The patch improves the heuristics by evaluating the actual available
>>> physical cores
>>> from the proc filesystem and the CPU mask, and use that as the basis
>>> calculating the ParallelGCThreads and ConcGCThreads.
>>> The improvements of GC pause time is significant. We evaluated on
>>> Nehalem, Westmere, Sandybridge as well as several AMD processors. We
>>> also evaluated on various CPU mask configuration and single/dual socket
>>> In almost all cases, there were speed up in GC pause time by 10~50%.
>>> We primarily use CMS collector for the evaluation, but we also
>>> tested on
>>> other GCs as well.
>>> Please take a look and let me know if this patch can be accepted.
>>> Jungwoo Ha
More information about the hotspot-gc-dev