RFR (S): 8161993: 8161993: G1 crashes if active_processor_count changes during startup

David Holmes david.holmes at oracle.com
Fri Jul 22 02:36:51 UTC 2016


Hi Thomas,

On 21/07/2016 11:05 PM, Thomas Schatzl wrote:
> Hi David,
>
>   thanks for your review.

Thanks for picking up 8147910 for me. One of those little tasks I 
couldn't quite justify doing at the moment :)

> On Thu, 2016-07-21 at 22:31 +1000, David Holmes wrote:
>> Hi Thomas,
>>
>> On 21/07/2016 9:28 PM, Thomas Schatzl wrote:
>>>
>>> Hi all,
>>>
>>>   can I have reviews for this small change that fixes crashes with
>>> G1 when changing the number of available processors while running?
>>>
>>> The problem is that some data structures are allocated using the
>>> number
>>> of available processors at startup, and if you later change that
>>> number
>>> (i.e. increase it), g1 uses indices to access these data structures
>>> beyond their capacity.
>>>
>>> The problem is in DirtyCardQueueSet::num_par_ids() that re-queries
>>> the number of active processors, and then is used e.g. in
>>> G1FromCardCache to index an array.
>> To be clear num_par_ids() queries the number of configured processors
>> which is constant across the life of the VM. But the number of
>> parallel GC threads is determined by the number of active processors
>> available to the VM - that value may be less than the number of
>> configured processors. (The fact it can change over time doesn't seem
>> to be a factor here - only that it may be smaller.)
>>
>>>
>>>
>>> The proposed solution is to introduce an
>>> initial_active_processor_count() in the runtime that stays fixed
>>> after initialization. This corresponds to the suggestion in JDK-
>>> 8147910, but only fixing the immediate crash here.
>> I'm not really seeing what else JDK-8147910 would want to add other
>> than reporting the initial count in the error reporting, so I'd
>> rather see this split into two parts - and have both issues closed
>> out. All that would involve is committing the os.hpp and os.cpp and
>> any change to error reporting under 8147910, and the rest under
>> 8161993. A single review thread and combined push is still fine.
>
> Okay, can do the split of this CR into two.
>
> I briefly looked through the hs_err log file, but could not find any
> particular section where this information would fit well. Any
> suggestions?

In os.cpp I would modify os::print_cpu_info as follows:

void os::print_cpu_info(outputStream* st, char* buf, size_t buflen) {
   // cpu
   st->print("CPU:");
   st->print("total %d", os::processor_count());
   // It's not safe to query number of active processors after crash
   // st->print("(active %d)", os::active_processor_count());
+ // but we can report the initial number of active processors
+ st->print(" (initial active %d)", os::initial_active_processor_count());
   st->print(" %s", VM_Version::features_string());
   st->cr();
   pd_print_cpu_info(st, buf, buflen);
}

(I prefer the term "available" to "active" but the later is ingrained in 
the VM :(  ).

>
>> That said I would not hide this inside os::set_processor_count the
>> way you have. I can see you have done it to avoid changing all the
>> os_xxx.cpp files, but the name no longer really fits as it is not a
>> simple setter operation any more, and it is odd to pass in one value
>> but then internally pull the other one. Albeit more work I would have
>> added set_initial_active_processor_count(int initial) and
>> initial_active_processor_count(), then have the OS initialization
>> methods call both set_processor_count and
>> set_initial_active_processor_count. Or it may be there is somewhere
>> in the shared os initialization code we could add that, once, before
>> it needs to be used for the worker thread calculation. I'd have to
>> look at that in more detail tomorrow.
>
> The call could be put in the os::init_before_ergo() (or
> VMVersion::init_before_ergo()) call.
>
> I would prefer to avoid code duplication.

Agreed. init_before_ergo() seems most reasonable of the two - as this is 
os related. The important thing is that it happens after 
processor_count() is initialized (which happens earlier within os::init) 
and before anyone tries to query it (which I think the GC will do as 
part of the ergonomic setup). So an assertion in 
initial_active_processor_count() to ensure it is not zero would be good. 
And for good measure an assertion that it is zero in 
set_initial_active_processor_count().

>> Also if we log the initial_active_count we should log the
>> _processor_count too.
>
> I can log both in log messages and the error log - although processor
> count information is already in the log.

With the new proposal I would just add the logging to 
set_initial_active_processor_count().

Thanks,
David
-----

> Thanks,
>   Thomas
>


More information about the hotspot-gc-dev mailing list