RFR(S): 8009536: G1: Apache Lucene hang during reference processing

John Cuthbertson john.cuthbertson at oracle.com
Mon Mar 18 09:37:03 PDT 2013


Hi Bengt,

Thanks. I'll make the change as you suggest. For some reason the push 
job failed on MacOS. It looks exactly like the problem these changes are 
supposed to fix.

JohnC

On 3/18/2013 4:40 AM, Bengt Rutisson wrote:
>
> Hi John,
>
> This looks good to me. Just like I thought ;)
>
> One nit if you feel like fixing it before you push:
>
> In a couple places you have this pattern:
>
> 2240           _task->do_marking_step(mark_step_duration_ms,
> 2241                                  false      /* do_termination */,
> 2242                                  _is_serial /* is_serial      */);
>
> I think the comment /* is_serial */ is superfluous now that we pass a 
> variable named _is_serial instead of just true or false.
>
> Bengt
>
>
> On 3/14/13 6:19 PM, John Cuthbertson wrote:
>> Hi Everyone,
>>
>> Here's the final webrev based upon Bengt's comments:
>>
>> http://cr.openjdk.java.net/~johnc/8009536/webrev.2/
>>
>> I think I'm all set after reviews from Bengt and Thomas - I just 
>> wanted to send out the final version of the webrev in case anyone 
>> else has other comments.
>>
>> Thanks,
>>
>> JohnC
>>
>> On 3/13/2013 2:01 PM, Bengt Rutisson wrote:
>>>
>>> Hi John,
>>>
>>> Thanks for explaining a bit about the serial case where we still 
>>> need to go through termination protocol.
>>>
>>> As you know I'll be on vacation the next two days, so I just wanted 
>>> to let you know that with the changes that you outline below (the 
>>> fix to CMRemarkTask and removing the do_stealing argument) I'm fine 
>>> with this change. Feel free to push this if you get another review.
>>>
>>> Thanks,
>>> Bengt
>>>
>>> On 3/13/13 6:57 PM, John Cuthbertson wrote:
>>>> Hi Bengt,
>>>>
>>>> Thanks for looking at the change again. Replies inline....
>>>>
>>>> On 3/13/2013 3:22 AM, Bengt Rutisson wrote:
>>>>>
>>>>>
>>>>> Hi John,
>>>>>
>>>>> I'm not too familiar with the do_marking_step() code, so I'm 
>>>>> trying to get to know it while I'm looking at your webrev. So, 
>>>>> maybe I'm completely off here, but I think there are a couple of 
>>>>> strange things.
>>>>>
>>>>> After your change do_marking_step() takes three bools: 
>>>>> do_stealing, do_termination and is_serial.
>>>>>
>>>>> This means that we have 8 possible states if we combine all value 
>>>>> of these bools. I went through all calls to do_marking_step() and 
>>>>> it seems like we make use of 4 of those states:
>>>>>
>>>>>
>>>>> G1CMKeepAliveAndDrainClosure::do_oop_work()
>>>>>   do_marking_step(false, false, true)             (1)
>>>>>
>>>>> CMConcurrentMarkingTask::work()
>>>>>   do_marking_step(true, true, false)              (2)
>>>>>
>>>>> CMRemarkTask::work()
>>>>>   do_marking_step(true, true, false)              (2)
>>>>>   do_marking_step(true, true, true)               (3)
>>>>>
>>>>> G1CMDrainMarkingStackClosure::do_void()
>>>>>   do_marking_step(false, true, true)              (4)
>>>>>   do_marking_step(true, true, false)              (2)
>>>>>
>>>>
>>>> Yeah. Well...
>>>>
>>>>>
>>>>> I numbered the states (1) - (4). Here are all states and I marked 
>>>>> which ones we use:
>>>>>
>>>>>  stealing,  termination,  serial    (3)
>>>>>  stealing, !termination,  serial
>>>>>  stealing, !termination, !serial
>>>>>  stealing,  termination, !serial    (2) parallel
>>>>> !stealing,  termination,  serial    (4)
>>>>> !stealing, !termination,  serial    (1) serial
>>>>> !stealing, !termination, !serial
>>>>> !stealing,  termination, !serial
>>>>>
>>>>>
>>>>> I think state (2) makes sense as the parallel case. We do stealing 
>>>>> and termination and we are not serial. State (1) also makes sense. 
>>>>> It is clearly the serial case. No stealing or termination. Just 
>>>>> serial.
>>>>>
>>>>> Now, case (3) looks odd to me. We say we are serial but we want to 
>>>>> do stealing and termination. This case comes from CMRemarkTask. It 
>>>>> takes a is_serial argument in its constructor. We get to case (3) 
>>>>> from ConcurrentMark::checkpointRootsFinalWork(), where we do:
>>>>>
>>>>>   if (G1CollectedHeap::use_parallel_gc_threads()) {
>>>>>     ...
>>>>>   } else {
>>>>>     ...
>>>>>     CMRemarkTask remarkTask(this, active_workers, true /* 
>>>>> is_serial*/);
>>>>>   }
>>>>>
>>>>> To me this looks like it should lead to the pure serial case (1) 
>>>>> instead of (3). Do you agree? In that case I think 
>>>>> CMRemarkTask::work() needs to be changed to do:
>>>>>
>>>>>         task->do_marking_step(1000000000.0 /* something very large 
>>>>> */,
>>>>>                               !_is_serial /* do_stealing */,
>>>>>                               !_is_serial /* do_termination */,
>>>>>                               _is_serial);
>>>>>
>>>>
>>>> You're right here. The serial remark case should not enable 
>>>> stealing. It doesn't make sense. Done!
>>>>
>>>>>
>>>>> The other case that sticks out is (4). We don't want to do 
>>>>> stealing, but we want to do termination even though we are serial. 
>>>>> This comes from how we set up the g1_drain_mark_stack in 
>>>>> ConcurrentMark::weakRefsWork(). We pass that along to the 
>>>>> reference processing, which I think is serial (wasn't that the 
>>>>> original problem? :) ). We also pass the g1_keep_alive which is 
>>>>> set up to be in state (1). So, I wonder if it is correct that the 
>>>>> g1_drain_mark_stack is in state (4). If you agree that we should 
>>>>> be using the pure serial case here too, I think we need to change 
>>>>> G1CMDrainMarkingStackClosure::do_void() to be:
>>>>>
>>>>>       _task->do_marking_step(1000000000.0 /* something very large */,
>>>>>                              !_is_serial /* do_stealing */,
>>>>>                              !_is_serial /* do_termination */,
>>>>>                              _is_serial  /* is_serial */);
>>>>>
>>>>>
>>>>
>>>> That is what we had before and I wasn't happy with it. The serial 
>>>> drain closure is the last thing that get's executed as part of 
>>>> reference processing - when process_phaseJNI invokes the 
>>>> complete_gc.do_void() method. In the drain closure either serial or 
>>>> parallel I think we want to go through the termination protocol. We 
>>>> just don't want to interact with the terminator in the serial case. 
>>>> There's a bunch of guarantees and checks in the termination 
>>>> protocol and we should still go through them - even when serial.
>>>>
>>>>>
>>>>> Again, I'm not sure about this, but if the above is correct we 
>>>>> will be left with only two states. The parallel and the serial. In 
>>>>> that case I think it would make sense to remove the do_stealing 
>>>>> and do_termination arguments for do_marking_step() and just pass 
>>>>> in the is_serial argument. Internally do_marking_step() could set 
>>>>> up two local variable to track stealing and termination. That 
>>>>> might be good if we see the need in the future to split these up 
>>>>> again.
>>>>
>>>> I think we can get rid of the do_stealing parameter - it's always 
>>>> the negation of the is_serial parameter and I like the idea of 
>>>> making it a local. How about that? For example
>>>>
>>>> void CMTask::do_marking_step(double ms, bool do_termination, bool 
>>>> is_serial) {
>>>>   // It does not make sense to allow stealing when called serially.
>>>>   bool do_stealing = !is_serial;
>>>>
>>>> Thanks,
>>>>
>>>> JohnC
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20130318/bd288867/attachment.html 


More information about the hotspot-gc-dev mailing list