RFR: 8243961: ForceNUMA and only one available NUMA node fails assertion on Windows
kim.barrett at oracle.com
Sat May 2 09:59:32 UTC 2020
> On Apr 30, 2020, at 6:24 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> On 29.04.20 14:28, Kim Barrett wrote:
>> Please review this Windows-specific change to the initialization of
>> mach5 tier1-3
>> mach5 testing on Windows with explict +UseG1GC +UseNUMA +ForceNUMA and
>> each of +/-/(default)UseNUMAInterleaving.
>> TestUseNUMAInterleaving fails for some of those test configurations on
>> non-NUMA hardware, because the test expects UseNUMAInterleaving to be
>> enabled if UseNUMA is enabled. As the test failure can only happen
>> with +ForceNUMA, that configuration doesn't seem to otherwise lead to
>> other problems, +ForceNUMA doesn't seem to be a tested configuration,
>> and ForceNUMA is going away (JDK-8243628), we'll not worry about that
> Change is okay, but what do you think about always disabling UseNUMA regardless of ForceNUMA on Windows instead?
> There does not seem to be a code path in the windows OS files that acts on UseNUMA. For NUMA related methods it returns default values, so any path in the GCs will just do nothing or act on useless default values from the os layer and do weird things.
> UseNUMAInterleaving is completely separate from UseNUMA functionality anyway (although "UseNUMA" is misleading at least to me, but that's another discussion).
> There is the problem then that probably many people enable UseNUMA to get UseNUMAInterleaving, but that could be handled in a windows specific way too. I.e. when -XX:+UseNUMA and -XX:+/-ForceNUMA is set, disable UseNUMA and enable UseNUMAInterleaving.
> That sounds much easier to understand to me than the current change.
> Some collectors (Z, Shenandoah) will set UseNUMA, but this evaluation is before os::init_2 so above idea would do the right thing as neither have any special UseNUMA path. I.e. both actually seem to want UseNUMAInterleaving. Shenandoah explicitly says so, so actually changing them to enable UseNUMAInterleaving instead of UseNUMA would be good too imo.
> On non-Windows there will be no difference to now.
[Added ppc-aix-port-dev, as there is a small AIX change here.]
I considered just unconditionally disabling UseNUMA on Windows. That
would also be sufficient to fix the bug, since the later +UseNUMA =>
+UseNUMAInterleaving wouldn't happen. But I felt that was leaving a
booby trap for whoever gets to deal with implementing UseNUMA on
To avoid that, the later +UseNUMA => +UseNUMAInterleaving needs to be
suppressed anyway. In which case there's no need (for this bug) to do
anything about UseNUMA/ForceNUMA.
Also, that would be a behavioral change if someone is actually using
-XX:+ForceNUMA (and either -XX:+UseNUMA or using a collector that
implicitly turns it on (ZGC and Shenandoah)) and expecting the current
behavior that -XX:+UseNUMAInterleaving will also be implicitly
enabled. Not that I think that's likely, but with ForceNUMA still
being a product option...
For other platforms I don't want to get into second guessing whether
someone said +UseNUMA but really meant +UseNUMAInterleaving.
I looked at moving the defaulting of +UseNUMAInterleaving when
+UseNUMA earlier. It could be done in Arguments::apply_ergo. But then
if UseNUMA ends up ultimately getting disabled we should undo that
Digging further into this, I discovered that on Linux (at least)
-XX:+UseNUMAInterleaving is effectively ignored if -UseNUMA, but not
really in a good way. +UseNUMAInterleaving controls whether
numa_make_global gets called in a couple of places. But that calls
into stuff that is setup by libnuma_init, which is only called if
+UseNUMA. numa_make_global copes with that lack of setup by checking
for it and (not completely trivially) doing nothing if not set up.
What I've ended up doing is moving the conditional enabling of
UseNUMAInterleaving into the platform-specific code, where there's
enough information to consistently get it right. I've also made all
the os variants be explicit about UseNUMA and UseNUMAInterleaving;
those that have no support for either now unconditionally set them
false. (This happens to keep TestUseNUMAInterleaving working on those
(No incremental; the code change to os_windows.cpp is the same, but
the rationale and commentary are entirely different.)
mach5 tier1-3, normally, with -XX:+UseNUMA -XX:+ForceNUMA added.
TestUseNUMAInterleaving still fails on Windows with +UseNUMA
+ForceNUMA on a single-node machine, because +UseNUMAInterleaving gets
turned off, which is not what the test expects.
Manually verified that on Linux -XX:+UseNUMAInterleaving alone gets
overridden off when libnuma_init is patched to fail.
More information about the hotspot-gc-dev