[PATCH] JDK NUMA Interleaving issue
Derek.White at cavium.com
Mon Dec 3 21:55:59 UTC 2018
Thanks for running the test cases for JDK-8205051<https://bugs.openjdk.java.net/browse/JDK-8205051>. My theory was incorrect - there's a reason I didn't try to fix that bug :-)
As far as clearing the UseNUMA flag goes, there's a few positions to take when UseNUMA was specified, but the system has only 1 node, or the JVM is bound to 1 node:
1. Pretend the user never set UseNUMA - turn it off. (Note that this can easily occur using JVM configs in containers).
2. Set up the JVM to use resources that it can't access - don't pay attention to binding.
3. Pay attention to binding. Single-node JVM goes through the UseNUMA paths, but correctly acts as a non-NUMA system.
Currently the JVM does a combination of 1 & 2. And yes, 2 is a bug.
Your patch seems to be implementing 3, which is fine, and it's great that it actually works! But it's riskier in that it has to be correct for all uses of UseNUMA (including various collectors), and for future uses of UseNUMA.
I had an expectation of what UseNUMA did in this case, but GC isn't my area any more so I'll stay out of it now.
From: amith pawar <amith.pawar at gmail.com>
Sent: Monday, December 3, 2018 1:21 AM
To: White, Derek
Cc: thomas.schatzl at oracle.com; hotspot-dev at openjdk.java.net; Prakash.Raghavendra at amd.com; roshanmanagal at gmail.com; swatibits14 at gmail.com; roshanmangal at gmail.com
Subject: Re: [PATCH] JDK NUMA Interleaving issue
On Mon, Dec 3, 2018 at 11:49 AM amith pawar <amith.pawar at gmail.com<mailto:amith.pawar at gmail.com>> wrote:
On Sat, Dec 1, 2018 at 4:10 AM White, Derek <Derek.White at cavium.com<mailto:Derek.White at cavium.com>> wrote:
Thanks for tackling this!
I think that your patch may fix https://bugs.openjdk.java.net/browse/JDK-8205051, although it's probably fixing more than that. I was thinking that the problem was related to calling " _numa_interleave_memory_v2(start, size, _numa_all_nodes_ptr);", and you replaced that call with more specific node sets, which seems better to me.
You can try the test cases listed in the bug to see if the patch fixes the problem.
Tested and didn't fix it.
The existing code disables UseNUMA if the process is bound to only one node, but it looks like your patch removes this. Is that a correct read of the patch, intentional, and/or a good idea?
I removed that after testing it with SPECjbb for single node case.
With patch : Enabling/Disabling UseNUMA gives same score.
Without patch: UseNUMA scores 10-12% lesser than without UseNUMA
From: hotspot-dev <hotspot-dev-bounces at openjdk.java.net<mailto:hotspot-dev-bounces at openjdk.java.net>> on behalf of amith pawar <amith.pawar at gmail.com<mailto:amith.pawar at gmail.com>>
Sent: Thursday, November 29, 2018 6:08 AM
To: thomas.schatzl at oracle.com<mailto:thomas.schatzl at oracle.com>
Cc: hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>; Prakash.Raghavendra at amd.com<mailto:Prakash.Raghavendra at amd.com>
Subject: Re: [PATCH] JDK NUMA Interleaving issue
Sorry for the late reply and please check my inline comments below.
On Thu, Nov 22, 2018 at 5:18 PM Thomas Schatzl <thomas.schatzl at oracle.com<mailto:thomas.schatzl at oracle.com>>
> Hi Amith,
> welcome to the OpenJDK community! :)
On Fri, 2018-11-09 at 17:53 +0530, amith pawar wrote:
> > Hi all,
> > The flag UseNUMA (or UseNUMAInterleaving), has to interleave old gen,
> > S1 and S2 region (if any other ) memory areas on requested Numa nodes
> > and it should not configure itself to access other Numa nodes. This
> > issue is observed only when Java is allowed to run on fewer NUMA
> > nodes than available on the system with numactl membind and
> > interleave options. Running on all the nodes does not have any
> > effect. This will cause some applications (objects residing in old
> > gen and survivor region) to run slower on system with large Numa
> > nodes.
> [... long explanation...]
> Is it possible to summarize the problem and the changes with the
> following few lines:
> "NUMA interleaving of memory of old gen and survivor spaces (for
> parallel GC) tells the OS to interleave memory across all nodes of a
> NUMA system. However the VM process may be configured to be limited to
> run only on a few nodes, which means that large parts of the heap will
> be located on foreign nodes. This can incurs a large performance
> The proposed solution is to tell the OS to interleave memory only
> across available nodes when enabling NUMA."
OK. Since, I dont have write access to the defect link so can I request you
do the same.
> We have had trouble understanding the problem statement and purpose of
> this patch when triaging (making sure the issue is understandable and
> can be worked on) as the text is rather technical. Having an easily
> understandable text also helps reviewers a lot.
> Assuming my summary is appropriate, I have several other unrelated
> - could you propose a better subject for this work? "JDK NUMA
> Interleaving issue" seems very generic. Something like "NUMA heap
> allocation does not respect VM membind/interleave settings" maybe?
This is also OK.
> - there have been other NUMA related patches from AMD recently, in
> particular JDK-what is the relation to JDK-8205051? The text there
> reads awfully similar to this one, but I *think* JDK-8205051 is
> actually about making sure that the parallel gc eden is not put on
> inactive nodes.
> Can you confirm this (talk to your colleague) so that we can fix the
> description too?
JDK-8205051 is related to early GC.
JDK-8189922 specific to membind case where heap was allocated on non
requested NUMA nodes.
The proposed patch fixes two following issues
1. Similar to JDK-8189922 but for interleave case.
2. OLDGen, S1 and S2 memory interleaving is incorrect when run on fewer
> - fyi, we are currently working on NUMA aware memory allocation support
> for G1 in JEP 345 (JDK-8210473). It may be useful to sync up a bit to
> not step on each other's toes (see below).
We are not working anything related to G1. It may effect G1 if
numa_make_global function is called.
> [... broken patch...]
> I tried to apply the patch to the jdk/jdk tree, which failed; I then
> started to manually apply it but stopped after not being able to find
> the context of some hunks. I do not think this change applies to the
> latest source tree.
> Please make sure that the patch applies to the latest jdk/jdk tree with
> errors. All changes generally must first go into the latest dev tree
> before you can apply for backporting.
> Could you please send the patch as attachment (not copy&pasted) to this
> list and cc me? Then I can create a webrev out of it.
> I did look a bit over the patch as much as I could (it's very hard
> trying to review a diff), some comments:
Sorry. Please check the attached patch.
> - the _numa_interleave_memory_v2 function pointer is never assigned
> to in the patch in the CR, so it will not be used. Please make sure the
> patch is complete.
> Actually it is never defined anywhere, ie. the patch unlikely actually
> compiles even if I could apply it somewhere.
> Please avoid frustrating reviewers by sending incomplete patches.
Sorry again. Faced same issue when copied from the defect link but worked
for me from the mail. Now i have attached the patch. Can you please update
defect link with this patch ?
> - I am not sure set_numa_interleave, i.e. the interleaving, should be
> done without UseNUMAInterleaving enabled. Some collectors may do their
> own interleaving in the future (JDK-8210473 / JEP-345) so this would
> massively interfere in how they work. (That issue may be because I am
> looking at a potentially incomplete diff, so forgive me if the patch
> already handles this).
Testing SPECjbb with UseNUMAInterleaving enabled/disabled has no effect
when java externally invoked in interleave mode. It helps membind case.
> - if some of the actions (interleaving/membind) fail, and it had been
> requested, it would be nice to print a log message.
Following two NUMA API's are used and they return nothing. Right now not
sure which cases to handle. Please suggest
void numa_tonode_memory(void *start, size_t size, int node);
void numa_interleave_memory(void *start, size_t size, struct bitmask
> Actually it would be nice to print information about e.g. the bitmask
> anyway in the log so that the user can easily verify that what he
> specified on the command line has been properly picked up by the VM -
> instead of looking through the proc filesystem.
As per the suggestion, patch is updated to print following information.
[0.001s][info][os] UseNUMA is enabled
[0.001s][info][os] Java is configured to run in interleave mode
[0.001s][info][os] Heap will be configured using NUMA memory nodes: 0, 2,
With best regards,
More information about the hotspot-dev