[PATCH] JDK NUMA Interleaving issue
Derek.White at cavium.com
Fri Nov 30 22:40:44 UTC 2018
Thanks for tackling this!
I think that your patch may fix https://bugs.openjdk.java.net/browse/JDK-8205051, although it's probably fixing more than that. I was thinking that the problem was related to calling " _numa_interleave_memory_v2(start, size, _numa_all_nodes_ptr);", and you replaced that call with more specific node sets, which seems better to me.
You can try the test cases listed in the bug to see if the patch fixes the problem.
The existing code disables UseNUMA if the process is bound to only one node, but it looks like your patch removes this. Is that a correct read of the patch, intentional, and/or a good idea?
From: hotspot-dev <hotspot-dev-bounces at openjdk.java.net> on behalf of amith pawar <amith.pawar at gmail.com>
Sent: Thursday, November 29, 2018 6:08 AM
To: thomas.schatzl at oracle.com
Cc: hotspot-dev at openjdk.java.net; Prakash.Raghavendra at amd.com
Subject: Re: [PATCH] JDK NUMA Interleaving issue
Sorry for the late reply and please check my inline comments below.
On Thu, Nov 22, 2018 at 5:18 PM Thomas Schatzl <thomas.schatzl at oracle.com>
> Hi Amith,
> welcome to the OpenJDK community! :)
On Fri, 2018-11-09 at 17:53 +0530, amith pawar wrote:
> > Hi all,
> > The flag UseNUMA (or UseNUMAInterleaving), has to interleave old gen,
> > S1 and S2 region (if any other ) memory areas on requested Numa nodes
> > and it should not configure itself to access other Numa nodes. This
> > issue is observed only when Java is allowed to run on fewer NUMA
> > nodes than available on the system with numactl membind and
> > interleave options. Running on all the nodes does not have any
> > effect. This will cause some applications (objects residing in old
> > gen and survivor region) to run slower on system with large Numa
> > nodes.
> [... long explanation...]
> Is it possible to summarize the problem and the changes with the
> following few lines:
> "NUMA interleaving of memory of old gen and survivor spaces (for
> parallel GC) tells the OS to interleave memory across all nodes of a
> NUMA system. However the VM process may be configured to be limited to
> run only on a few nodes, which means that large parts of the heap will
> be located on foreign nodes. This can incurs a large performance
> The proposed solution is to tell the OS to interleave memory only
> across available nodes when enabling NUMA."
OK. Since, I dont have write access to the defect link so can I request you
do the same.
> We have had trouble understanding the problem statement and purpose of
> this patch when triaging (making sure the issue is understandable and
> can be worked on) as the text is rather technical. Having an easily
> understandable text also helps reviewers a lot.
> Assuming my summary is appropriate, I have several other unrelated
> - could you propose a better subject for this work? "JDK NUMA
> Interleaving issue" seems very generic. Something like "NUMA heap
> allocation does not respect VM membind/interleave settings" maybe?
This is also OK.
> - there have been other NUMA related patches from AMD recently, in
> particular JDK-what is the relation to JDK-8205051? The text there
> reads awfully similar to this one, but I *think* JDK-8205051 is
> actually about making sure that the parallel gc eden is not put on
> inactive nodes.
> Can you confirm this (talk to your colleague) so that we can fix the
> description too?
JDK-8205051 is related to early GC.
JDK-8189922 specific to membind case where heap was allocated on non
requested NUMA nodes.
The proposed patch fixes two following issues
1. Similar to JDK-8189922 but for interleave case.
2. OLDGen, S1 and S2 memory interleaving is incorrect when run on fewer
> - fyi, we are currently working on NUMA aware memory allocation support
> for G1 in JEP 345 (JDK-8210473). It may be useful to sync up a bit to
> not step on each other's toes (see below).
We are not working anything related to G1. It may effect G1 if
numa_make_global function is called.
> [... broken patch...]
> I tried to apply the patch to the jdk/jdk tree, which failed; I then
> started to manually apply it but stopped after not being able to find
> the context of some hunks. I do not think this change applies to the
> latest source tree.
> Please make sure that the patch applies to the latest jdk/jdk tree with
> errors. All changes generally must first go into the latest dev tree
> before you can apply for backporting.
> Could you please send the patch as attachment (not copy&pasted) to this
> list and cc me? Then I can create a webrev out of it.
> I did look a bit over the patch as much as I could (it's very hard
> trying to review a diff), some comments:
Sorry. Please check the attached patch.
> - the _numa_interleave_memory_v2 function pointer is never assigned
> to in the patch in the CR, so it will not be used. Please make sure the
> patch is complete.
> Actually it is never defined anywhere, ie. the patch unlikely actually
> compiles even if I could apply it somewhere.
> Please avoid frustrating reviewers by sending incomplete patches.
Sorry again. Faced same issue when copied from the defect link but worked
for me from the mail. Now i have attached the patch. Can you please update
defect link with this patch ?
> - I am not sure set_numa_interleave, i.e. the interleaving, should be
> done without UseNUMAInterleaving enabled. Some collectors may do their
> own interleaving in the future (JDK-8210473 / JEP-345) so this would
> massively interfere in how they work. (That issue may be because I am
> looking at a potentially incomplete diff, so forgive me if the patch
> already handles this).
Testing SPECjbb with UseNUMAInterleaving enabled/disabled has no effect
when java externally invoked in interleave mode. It helps membind case.
> - if some of the actions (interleaving/membind) fail, and it had been
> requested, it would be nice to print a log message.
Following two NUMA API's are used and they return nothing. Right now not
sure which cases to handle. Please suggest
void numa_tonode_memory(void *start, size_t size, int node);
void numa_interleave_memory(void *start, size_t size, struct bitmask
> Actually it would be nice to print information about e.g. the bitmask
> anyway in the log so that the user can easily verify that what he
> specified on the command line has been properly picked up by the VM -
> instead of looking through the proc filesystem.
As per the suggestion, patch is updated to print following information.
[0.001s][info][os] UseNUMA is enabled
[0.001s][info][os] Java is configured to run in interleave mode
[0.001s][info][os] Heap will be configured using NUMA memory nodes: 0, 2,
More information about the hotspot-dev