RFR: 2178143: VM crashes if the number of bound CPUs changed during runtime

Karen Kinnear karen.kinnear at oracle.com
Thu Mar 21 06:54:11 PDT 2013


Yumin,

I really like the suggestion from David and from Harold that we simplify this fix to just be a boolean - isMP, and limit the scope of
the change to isMP. It won't change the number of processors, and won't tune everything in the libraries etc. according to actual cpus or
processors sets. The original goal was to allow folks to get the MP locking even if there is only one processor when the vm starts up.
Let's scale back to that if you don't mind.

thanks,
Karen

On Mar 21, 2013, at 12:37 AM, Yumin Qi wrote:

> Hi, David
> 
> On 3/20/2013 6:02 PM, David Holmes wrote:
>> On 21/03/2013 7:27 AM, Yumin Qi wrote:
>>> Hi, can I have your code review of a small change?
>> 
>> Not really small conceptually. :)
>> 
>> I don't think this form of the fix addresses the underlying issue as discussed in the bug report. If the variable was renamed MinimumNumberOfProcessors, or MinimumAssumedProcessors, then simply using it to turn on is_MP would be okay. Such a flag would suit the initial problem perfectly. Of course any value >2 would be semantically indistinct, so this really acts as a boolean flag - AssumeMP.
>> 
>> The more general NumberOfProcessors approach, which I'm still unsure of, should to me control what is reported for available-processors. That way it would affect everything in the VM, libraries and application code that configures itself based on the number of available processors. The main usecase for that, in my opinion, would be for apps running on large systems but you want to constrain it to using a subset of the physical CPUs (without having to configure processor sets). That is a different kind of problem and a different kind of flag. Using NumberOfProcessors but not having it control anything except is_MP just seems wrong - and using it to replace available_processors is not a complex change.
>> 
>> The VM is not designed for dynamic adaptation of threads/pools so if the number of processors does change dynamically neither of the above options are going to provide solutions to the potential performance problems that will be encountered (too many or too few threads). Any apps that starts on single core (as reported by the OS) is going to be under-provisioned.
>> 
>> David
>> -----
>> 
> I think this is only a workaround and not a solution for the specific use case, or  there is no perfect solution for it. If customers decided to use this flag,  they should be aware that they are ready not to consider performance at first.  For using number of available processors, we need to add code to get that number, not the one in hotspot os::active_processor_count() which will return the number of live processors.  So do you think I could just use a flag -XX:+AssumeMP work around  this problem? Since GC threads will not be based on this assumption, I agree (Harold pointed this out either) that one flag is simpler.  In fact, the code for is_MP() is obsolete for today's computers, all with multi-cores (even for cell chips) so I think better solution is remove call to is_MP() in all places in hotspot. I will prepare another webrev with -XX:+AssumeMP for next codereview.
> 
> For number of GC Threads:
> unsigned int Abstract_VM_Version::nof_parallel_worker_threads(
>                                                      unsigned int num,
>                                                      unsigned int den,
>                                                      unsigned int switch_pt) {
>  if (FLAG_IS_DEFAULT(ParallelGCThreads)) {
>    assert(ParallelGCThreads == 0, "Default ParallelGCThreads is not 0");
>    // For very large machines, there are diminishing returns
>    // for large numbers of worker threads.  Instead of
>    // hogging the whole system, use a fraction of the workers for every
>    // processor after the first 8.  For example, on a 72 cpu machine
>    // and a chosen fraction of 5/8
>    // use 8 + (72 - 8) * (5/8) == 48 worker threads.
>    unsigned int ncpus = (unsigned int) os::active_processor_count();
>    return (ncpus <= switch_pt) ?
>           ncpus :
>          (switch_pt + ((ncpus - switch_pt) * num) / den);
>  } else {
>    return ParallelGCThreads;
>  }
> }
> 
> the call to this function is
> unsigned int Abstract_VM_Version::calc_parallel_worker_threads() {
>  return nof_parallel_worker_threads(5, 8, 8);
> }
> 
> We can see that, if active_processor_count is 1, it will return 1, and the VM will run with single GC thread. So the better choices maybe:
> 
> 1) get processor count not active processor count for ParallelGCThreads, that is up to decision from GC team.
> 2) Recommend usage is
> 
>  -XX:+AssumeMP -XX:ParallelGCThreads=<number>
> 
> Thanks
> Yumin
> 
>>> 2178143:  VM crashes if the number of bound CPUs changed during runtime.
>>> 
>>> Situation: Customer first configure only one CPU online and turn others
>>> offline to run java application, after java program started, bring more
>>> CPUs back online. Since VM started on a single CPU, os::is_MP() will
>>> return false, but after more CPUs available, OS will schedule the app
>>> run on multiple CPUs, this caused SEGV in various places where data
>>> consistency was broken. The solution is supply a flag to assume it is
>>> running on MP, so lock is forced to be called.
>>> 
>>> http://cr.openjdk.java.net/~minqi/2178143/
>>> 
>>> Thanks
>>> Yumin
> 



More information about the hotspot-gc-dev mailing list