RFR(S): 8004816: G1: Kitchensink failures after marking stack changes

John Cuthbertson john.cuthbertson at oracle.com
Mon Dec 17 18:06:29 PST 2012


Hi Everyone,

Here's a new webrev based upon the comments from Jon:

http://cr.openjdk.java.net/~johnc/8004816/webrev.1/

Thanks,

JohnC

On 12/14/12 06:51, Jon Masamitsu wrote:
> John,
>
> Your choice of reset_marking_state() below is fine.
>
> Jon
>
> On 12/13/2012 3:27 PM, John Cuthbertson wrote:
>> Hi Jon,
>>
>> Thanks. I agree a name change would OK. Any particular preference?
>>
>> Currently clear_marking_state() is called on the following paths....
>>
>> The initial mark path...
>>
>> CheckpointRootsInitialPre -> reset() -> clear_marking_state().
>>
>> ConcurrentMark initialization path...set_non_marking_state clears 
>> that marking is in progress.
>>
>> ConcurrentMark() -> set_non_marking_state() -> clear_marking_state().
>>
>> Marking is finished path (from remark)....
>>
>> CheckpointRootsFinal -> set_non_marking_state() -> clear_marking_state()
>>
>> Overflow in remark path (1)....
>>
>> CheckpointRootsFinal -> clear_marking_state()
>>
>> Overflow from CMTask::do_marking_step() (overflow during actual 
>> marking or remark) by worker 0 only....
>>
>> do_marking_step() -> enter_first_barrier_sync() -> 
>> clear_marking_state().
>>
>> I think that's all of them. Given these, my preference would be 
>> reset_marking_state().
>>
>> Thanks again for the review.
>>
>> JohnC
>>
>> On 12/13/12 15:12, Jon Masamitsu wrote:
>>> John,
>>>
>>> Your fix looks good.
>>>
>>> A small suggestion.  Instead of clear_marking_state() how
>>> about reinitialize_marking_state() or reset_marking_state()?
>>> When I went to look at what "clear_marking_state"
>>> did, I expected it to simply set a variable but it does
>>> more.  I thought that "reinitialize" or "reset" would
>>> warn the reader that more was happening.
>>> If this is in keeping with G1 naming
>>> convention, it's fine to leave it.
>>>
>>> Jon
>>>
>>>
>>>
>>> On 12/13/12 09:45, John Cuthbertson wrote:
>>>> Hi Everyone,
>>>>
>>>> Can I have a couple of volunteers review the fix for this CR? The 
>>>> webrev can be found at: 
>>>> http://cr.openjdk.java.net/~johnc/8004816/webrev.0/
>>>>
>>>> Summary:
>>>> The reduced mark stack size that came in as part of the changes for 
>>>> 8000244 exposed this issue. If the marking stack overflowed as a 
>>>> result of pushes from the serial reference processing closures, the 
>>>> marking stack overflow flag was not cleared. When marking 
>>>> eventually completed, after the subsequent restarting, the still 
>>>> set overflow flag was detected and resulted in a guarantee failure. 
>>>> I also discovered that the actual marking state was not correctly 
>>>> cleared for restarting for such an overflow and, as a result, some 
>>>> referent objects might have been skipped by marking (those that are 
>>>> reachable from the objects we failed to push as a result of the 
>>>> overflow).
>>>>
>>>> Normally this is not a problem as the serial reference processing 
>>>> code is executed infrequently - except on systems with one or two 
>>>> cpus. The parallel reference processing closures reset the marking 
>>>> state correctly in the event of an overflow in the global marking 
>>>> stack.
>>>>
>>>> Testing:
>>>> Kitchensink on 1 and 2 cpu systems with a very low mark stack size 
>>>> and marking verification.
>>>>
>>>> Thanks,
>>>>
>>>> JohnC
>>



More information about the hotspot-gc-dev mailing list