RFR(L): 8220312: Implementation: NUMA-Aware Memory Allocation for G1, Logging (3/3)

Stefan Johansson stefan.johansson at oracle.com
Mon Nov 4 16:20:11 UTC 2019

Hi Sangheon,

Thanks for addressing my comments, some comments inline and additional 
comments below.

On 2019-11-02 07:08, sangheon.kim at oracle.com wrote:
>> ...
>> As we discussed offline, I would like to know the mismatches as well, 
>> I think the easiest approach would be to make the total count per node 
>> as well and that way we can see if there were any regions that didn't 
>> match. What do you think about printing the info like this:
>> [3,009s][trace][gc,heap,numa ] GC(6) NUMA region verification 
>> (actual/expected): 0: 1024/1024, 1: 270/1024, Unknown: 0
> Changed as you suggested to have per node but deleted unknown as unknown 
> is 'total - matched'.
I'm fine with skipping 'Unknown', but not all regions in (total - 
matched) must be 'Unknown' since there is the possibility of a real 
mismatch as well.
>> When testing this I also realized this output is problematic in the 
>> case where we have committed regions that have not yet been used. 
>> Reading the manual for get_mempolicy (the way we get the numa id for 
>> the address) say:
>> "If no page has yet been allocated for the specified address, 
>> get_mempolicy() will allocate a page as if the thread had performed a 
>> read (load) access to that address, and return the ID of the node 
>> where that page was allocated."
> Nice catch.
> Shortly saying get_mempolicy() doesn't honor os::numa_make_local() call 
> that we previously requested. And this problem occurs when 
> AlwaysPreTouch is disabled.
> However, my initial implementation which uses 'numa_move_pages()' 
> doesn't have this problem. So one fundamental solution would be 
> replacing linux implementation.
> In current scope of 3 patches, there will be no problem if we add 
> 'hr->free && !AlwaysPreTouch' condition check however 
> os::numa_get_group_id_for_address() will still have such limitation.
> What do you think about changing the Linux implementation?
> (webrev link is added at the end)
I agree that we should use numa_move_pages(), but I think we might want 
to change things a bit more because of this. I will add those comments 
below the patch to have all code comments in one place.

>> ... >> One can also see that this verification takes some time, so maybe it
>> would make sense to have this logging under gc+numa+verify.
> I think if we avoid calling G1NUMAA:index_of_address() if 'hr->free() && 
> !AlwaysPreTouch'  but count total, we will be fine.
> Basically I imported your patch but the verification is only happening 
> at the beginning of GC.

Any reason for not including it at the end of GC as well? I guess that 
could be good to be able to diagnose if there was an increase imbalance 
due to the collection.

>> ...
>>  166 void G1NodeTimes::print_phase_info(G1NodeTimes::NodeStatPhases 
>> phase) {
>>  167   LogTarget(Info, gc, heap, numa) lt;
>> I think this should be on debug level, but if you don't agree leave it 
>> as is.
> I feel Info seems okay, so let me leave as is.
>> ---
>>  191 void G1NodeTimes::print_mutator_alloc_stat_debug() {
>>  192   LogTarget(Debug, gc, heap, numa) lt;
>> And if you agree on moving the above to debug I think this should be 
>> on trace level.
> As is, please.
I'm okay with that, I just wanted to point out that compared to the 
other G1 logging on Info/Debug level these seemed to be one off.

> Here's the webrev:
> http://cr.openjdk.java.net/~sangheki/8220312/webrev.4

Some additional comments:

2598 void G1CollectedHeap::verify_numa_regions() {
2599   LogTarget(Trace, gc, heap, numa) lt;

I realize I didn't include that in my example patch, but did you have 
anything against changing the tags here to 'gc, heap, verify'?

  281     _ls->print("%d: %u / %u, ", numa_ids[i], _matched[i], _total[i]);

This will leave a trailing ',' now when we don't have the 'Unknown' 
output. I think just having a space as the separator between the nodes 
could be fine to avoid this.

  292   if (hr->is_free() && !AlwaysPreTouch) {
  293     active_node_index = G1NUMA::UnknownNodeIndex;
  294   } else {
  295     active_node_index = _numa->index_of_address(hr->bottom());
  296   }
  298   if (preferred_node_index == active_node_index) {
  299     _matched[preferred_node_index]++;
  300   }
  301   _total[preferred_node_index]++;

As I touched upon above, I think we still lack some important 
information here. If we are to look at all committed regions, which 
probably is good, we need to keep a mismatch count as well. Otherwise we 
can't know if the diff between matched and total is just unused regions 
or actual mismatches without relying on other logs and analysis. Or am I 
missing something?

   30 class G1RegionCounts;

Just having a forward declaration here is not enough, so please add back 
the include.

99   void print_phase_info(G1NUMAStats::NodeDataItems phase);

Sorry for being so picky on the naming, but since we changed the class 
and enum I think this method, the comments and the parameters should 
change as well. For example:
void print_info(G1NUMAStats::NodeDataItems type);

> http://cr.openjdk.java.net/~sangheki/8220312/webrev.4.inc
> http://cr.openjdk.java.net/~sangheki/8220312/webrev.4.inc.numa_move_pages
As I said above, I think we should go with this implementation of 
getting the node given an address, but I have a comment:
3018   if (os::Linux::numa_move_pages(0, 1, pages, NULL, &id, 0) == -1) {
3019     return -1;
3020   }
3021   if (id < 0) {
3022     return -1;
3023   }

I think we should differ between the case where the call fails (return 
-1) and the id returned is negative. From reading the man_pages for 
'move_pages()' (used by numa_move_pages()) it seems we can expect to 
either get -EFAULT (for normal pages) or -ENOENT (for large pages) when 
a region has not yet been used. I'm not completely sure how much logic 
we want to add to this method, so maybe creating a RFE to look at this 
in more detail later is good. But for this first version I think your 
proposal is good.

One thought I just got, with this implementation is more like 'Unused', 
rather then 'Unknown'. But such a notion change could also be looked at 
in the RFE.


> Testing: hs-tier 1 ~ 5 with / without UseNUMA.
> Thanks,
> Sangheon
>> ---
>> This is it for now. Thanks,
>> Stefan
>>> Testing: hs-tier 1 ~ 4 with/without UseNUMA. hs-tier5 is almost 
>>> finished without new failures.
>>> Thanks,
>>> Sangheon

More information about the hotspot-gc-dev mailing list