RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3)

Kim Barrett kim.barrett at oracle.com
Wed Sep 25 01:44:00 UTC 2019

> On Sep 21, 2019, at 1:19 AM, sangheon.kim at oracle.com wrote:
> webrev:
> http://cr.openjdk.java.net/~sangheki/8220310/webrev.1
> http://cr.openjdk.java.net/~sangheki/8220310/webrev.1.inc (this may not help much! :) )
> Testing: hs-tier 1 ~ 5 (with/without UseNUMA)

  96   uint _node_index;

Protected; should be private.

  42   _mutator_alloc_region(NULL),

Should be _mutator_alloc_regions (plural), since it's now an array.

Similarly, these should be pluralized:
  67 void G1Allocator::init_mutator_alloc_region() {
  74 void G1Allocator::release_mutator_alloc_region() {

And this
  48   // The number of MutatorAllocRegions used, one per memory node.
  49   size_t _num_alloc_region;

  53 G1Allocator::~G1Allocator() {
  54   for (uint i = 0; i < _num_alloc_region; i++) {
  55     _mutator_alloc_region[i].~MutatorAllocRegion();
  56   }
  57   FREE_C_HEAP_ARRAY(MutatorAllocRegion, _mutator_alloc_region);
  58 }

--- should also be calling _mutator_alloc_region[i].release() ??
--- or does destructor do that?

 161   if (UseNUMA) {
 162     if (FLAG_IS_DEFAULT(AlwaysPreTouch)) {
 163       FLAG_SET_DEFAULT(AlwaysPreTouch, true);
 164     }
 165     if (!AlwaysPreTouch && FLAG_IS_CMDLINE(AlwaysPreTouch)) {
 166       warning("Disabling AlwaysPreTouch is incompatible with UseNUMA. Disabling UseNUMA.");
 167       FLAG_SET_ERGO(UseNUMA, false);
 168     }
 169   }

Stefan asked about why AlwaysPreTouch is required when UseNUMA. I have
a different question. Assuming UseNUMA does require AlwaysPreTouch,
why is !AlwaysPreTouch winning here? Why not have UseNUMA win if they
are conflicting?

But see discussion below about
G1RegionsSmallerThanCommitSizeMapper::commit_regions(), which
suggested AlwaysPreTouch is required.

  83 G1PageBasedVirtualSpace::~G1PageBasedVirtualSpace() {
  92   _numa                   = NULL;
  93 }

[pre-existing] Destructors are for resource management. Nulling out /
zeroing out members in a destructor generally isn't useful. This is
really a comment on the existing code rather than a request to change
anything. The addition of line 92 is okay in context, just the context
is not good.

 108         _next_node_index(G1MemoryNodeManager::mgr()->num_active_nodes() - 1),
 109         _max_node_index(G1MemoryNodeManager::mgr()->num_active_nodes()) {

Consider reversing the order of these members and their initializers,
so the _next_node_index can use _max_node_index rather than another
call to num_active_nodes().

 113     uint next_node_index() const {
 114       return _next_node_index;
 115     }

I think this is mis-named.  It's the current index for the
distributor.  I think it should just be called "node_index".


I'm confused by G1RegionsSmallerThanCommitSizeMapper::commit_regions().

For the LargerThan mapper, we have a sequence of regions that
completely covers a sequence of pages, and we commit all of the
associated pages using the requested node_index.

For the SmallerThan mapper, we have a sequence of regions split up
into subsequences that are each contained in a single page.  The first
such subsequence might be on an already committed page.  Similarly for
the last subsequence.  Nothing is done to those pages.

In between there may be a series of region sequences, with each region
sequence on a single page.  If there are more than one of these region
sequences then more than one page will need to be committed.

As we step through the seuqnce of pages and commit them, we also step
the numa index to use for each page.

Stefan asked a question in this area about the mechanism by which the
node stepping is provided, and you responded with what sounds like an
improvement.  But I have a different question.

Why are we committing different pages on different numa nodes? The
caller requested these regions be on the requested node. Why are we
not honoring that request (as much as possible within the constraints
of possible leading and trailing regions being on already committed
pages.) The comment for G1NodeDistributor discusses (at a high level)
what it's doing (e.g. a short summary of the above description), but
there is no discussion of why that distribution is needed or

There might be a good reason for this behavior, in which case your
response with an improvement sounds good.  But if so, I'm guessing I
won't be the only one who doesn't know what that reason might be, and
it would be good to provide an explanatory comment.  And of course, if
there isn't a good reason...

I think there is also a problem here if AlwaysPreTouch is false. (As
discussed earlier, maybe it isn't required to be true.) The node index
for the committed regions gets set (in make_regions_available) via the
result of the syscall, so we really need pretouch to have been done.
The alternative would be to assume commit_regions used the requested
numa node.  But with the request stepping that wouldn't hold.  Of
course, it also doesn't hold for any leading or trailing regions that
were covered by already committed pages.

I think this is the basis of your argument that AlwaysPreTouch is
required for UseNUMA, and I think I'm now agreeing.  Otherwise we may
think the leading and trailing regions in the sequence are on a
different node than they actually are, since the associated pages may
have already been committed on a different node than requested, but
not yet touched.

But I still don't know why we would want to cycle through nodes for
the "middle" pages.

 127     if (G1VerifyNUMAIdOfHeapRegions) {
 128       // Read actual node index via system call.
 129       uint actual_node_index = mgr->index_of_address(hr->bottom());
 130       if (hr->node_index() != actual_node_index) {

Can we actually do this here?  I thought the system call only gave a
useful answer if the addressed location has been paged in.  I'm not
sure that's necessarily happened at this point.

I think Stefan suggested the logging of mismatches between requested
and actual numa node for a region should occur at region retirement.
We could log mismatches there and correct the region's information.

But see discussion above about
G1RegionsSmallerThanCommitSizeMapper::commit_regions().  If
AlwaysPreTouch is indeed required, then this code is okay.

 156   HeapRegion * cur;

s/HeapRegion */HeapRegion*/

  38   static const uint InvalidNodeIndex = (uint)os::InvalidId;
  39   static const uint AnyNodeIndex = (uint)os::AnyId;

I complained about these in my review of webrev.0. These are making
very strong assumptions about the values of the os Id values, for no
good reason that I can see. You responded

"But the intend is to make same value after casting for same meaning
constants instead of randomly chosen ones."

I don't buy that. There aren't any casts that I can see between NUMA
ids and indexes. Nor should there be any such casts. If there were,
I'd strongly question them, as being units mismatches.

  54   virtual const int* node_ids() const { static int dummy_id = 0; return &dummy_id; }

dummy_id should be const.

I would probably put that definition in the .cpp file. I've run into
multiple compiler bugs with function scoped static variables in inline
functions. Not recently, but I'm paranoid.

  34 class G1MemoryNodeManager : public CHeapObj<mtGC> {
  43   static G1MemoryNodeManager* create();

Given that we have a factory function that should be used for
creation, the constructor ought to be non-public.  It needs to be
protected so the derived G1MemoryNodeManager can refer to it.

A different approach would have G1MemoryNodeManager be abstract (with
all virtuals but the destructor being pure), with hidden (possibly
private nested) classes for the single-node and multi-node cases.

  62   if (UseNUMA SOLARIS_ONLY(&& false)) {

I thought we were only providing Linux support.  This seems like it
would attempt to work (and probably fail somewhere later) on other
platforms (anything not Linux or Solaris).

  64     G1NUMA* numa = new G1NUMA();
  66     if (numa != NULL) {

numa cannot be NULL here.

  86 G1MemoryMultiNodeManager::~G1MemoryMultiNodeManager() {
  87   delete _numa;
  88 }

This is leaving a stale pointer to the G1NUMA object in wherever
G1NUMA::set_numa stashed it.

1500   _mem_node_mgr(G1MemoryNodeManager::create()),

Maybe this manager should be created in G1CH::initialize() (and
initialized here to NULL).

Then the page_size could be passed to create, and there wouldn't be a
need to later set the page size of the manager and pass that along to
the G1NUMA, instead both getting it as a constructor argument.  Then
the associated setters go away too.

  95   // Gets a next valid numa id.
  96   inline int next_numa_id();

Appears to be unused.

 102 uint G1MemoryMultiNodeManager::index_of_current_thread() const {
 103   int node_id = os::numa_get_group_id();
 104   return _numa->index_of_numa_id(node_id);
 105 }

Other than here, os::numa_xxx usage is encapsulated in G1NUMA, with
the manager forwarding to the G1NUMA object as needed.  I suggest
doing that here too.  (Note that this file doesn't #include os.hpp.)
I think doing so eliminates the need for G1NUMA::index_of_numa_id(),
which also seems like a good thing.

  92 void G1NUMA::touch_memory(address aligned_address, size_t size_in_bytes, uint numa_index) {

Assert aligned_address is page aligned?
Assert size_in_bytes is a page aligned?

  42   memset(_numa_id_to_index_map, 
  43          G1MemoryNodeManager::InvalidNodeIndex,
  44          sizeof(uint) * _len_numa_id_to_index_map);

memset only works here because all bytes of InvalidNodeIndex happen to
have the same value.  I would prefer an explicit fill loop rather than
memset here.  Or a static assert on the value, but that's probably
more code.


More information about the hotspot-gc-dev mailing list