RFR: 8227006 - [linux] Runtime.availableProcessors execution time increased by factor of 100
sgehwolf at redhat.com
Thu Oct 31 14:44:36 UTC 2019
On Wed, 2019-10-30 at 14:35 -0400, Bob Vandette wrote:
> Summary: The container support added to Linux caused the call to os::active_processor_count to
> become much more expensive than prior to the container addition. This RFR adds a cache to avoid
> the extra overhead on repetitive calls to Runtime.getRuntime().availableProcessors(). The cache
> timeout is set at 20ms. This allows the docker update command to alter the number of available processors
> which will be reflected in the return from the API.
> BUG: https://bugs.openjdk.java.net/browse/JDK-8227006
> WEBREV: http://cr.openjdk.java.net/~bobv/8227006/webrev.01
I was surprised that this actually improving performance and had a
closer look. Prior this patch, os::Linux::active_processor_count() is
being called unconditionally. Post-patch it's not called if we still
have the cached value (UseContainerSupport=true by default). So it
makes sense now. Contrast results with
'-XX:ActiveProcessorCount=X' runs which, similarly, doesn't call
os::Linux::active_processor_count(). For that reason -XX:-
UseContainerSupport will "regress" after this patch.
32 #define OSCONTAINER_ERROR (-2)
33 #define OSCONTAINER_CACHE_TIMEOUT (NANOSECS_PER_SEC/50)
670 // Update the value and set the cache timeout to 20ms.
671 OSContainer::_active_processor_count = result;
Perhaps we could move the references to "20ms" to where
OSCONTAINER_CACHE_TIMEOUT is defined, rather than the usages of the
macro. Those comments will become stale fast should we change it to
some other value (or make it configurable via a flag).
Aside: This change conflicts with the cgroups v2 work I'm doing in .
I've got a local version which also incorporates your changes.
> Adding a single cache of active_processor_count with a 20ms timeout results in the following improvements.
> ORIGINAL SCORES
> 39346 calls/sec
> 40007 calls/sec
> 39708 calls/sec
> 39732 calls/sec
> 39736 calls/sec
> 39629 calls/sec
> DISABLE CONTAINER SUPPORT (-XX:-UseContainerSupport)
> 1106795 calls/sec
> 1135589 calls/sec
> 1123782 calls/sec
> 1123605 calls/sec
> 1123343 calls/sec
> 1123135 calls/sec
> 20MS TIMEOUT CACHE
> 9867172 calls/sec
> 10077977 calls/sec
> 9989674 calls/sec
> 9983963 calls/sec
> 9984400 calls/sec
> 9985898 calls/sec
> 9985123 calls/sec
> 9985608 calls/sec
> So the cache actually improves performance beyond the pre-container support by 8x but
> this of course only helps programs that call this function in a loop.
More information about the hotspot-runtime-dev