RFR(S): 8004816: G1: Kitchensink failures after marking stack changes
jon.masamitsu at oracle.com
Fri Dec 14 06:51:23 PST 2012
Your choice of reset_marking_state() below is fine.
On 12/13/2012 3:27 PM, John Cuthbertson wrote:
> Hi Jon,
> Thanks. I agree a name change would OK. Any particular preference?
> Currently clear_marking_state() is called on the following paths....
> The initial mark path...
> CheckpointRootsInitialPre -> reset() -> clear_marking_state().
> ConcurrentMark initialization path...set_non_marking_state clears that
> marking is in progress.
> ConcurrentMark() -> set_non_marking_state() -> clear_marking_state().
> Marking is finished path (from remark)....
> CheckpointRootsFinal -> set_non_marking_state() -> clear_marking_state()
> Overflow in remark path (1)....
> CheckpointRootsFinal -> clear_marking_state()
> Overflow from CMTask::do_marking_step() (overflow during actual
> marking or remark) by worker 0 only....
> do_marking_step() -> enter_first_barrier_sync() -> clear_marking_state().
> I think that's all of them. Given these, my preference would be
> Thanks again for the review.
> On 12/13/12 15:12, Jon Masamitsu wrote:
>> Your fix looks good.
>> A small suggestion. Instead of clear_marking_state() how
>> about reinitialize_marking_state() or reset_marking_state()?
>> When I went to look at what "clear_marking_state"
>> did, I expected it to simply set a variable but it does
>> more. I thought that "reinitialize" or "reset" would
>> warn the reader that more was happening.
>> If this is in keeping with G1 naming
>> convention, it's fine to leave it.
>> On 12/13/12 09:45, John Cuthbertson wrote:
>>> Hi Everyone,
>>> Can I have a couple of volunteers review the fix for this CR? The
>>> webrev can be found at:
>>> The reduced mark stack size that came in as part of the changes for
>>> 8000244 exposed this issue. If the marking stack overflowed as a
>>> result of pushes from the serial reference processing closures, the
>>> marking stack overflow flag was not cleared. When marking eventually
>>> completed, after the subsequent restarting, the still set overflow
>>> flag was detected and resulted in a guarantee failure. I also
>>> discovered that the actual marking state was not correctly cleared
>>> for restarting for such an overflow and, as a result, some referent
>>> objects might have been skipped by marking (those that are reachable
>>> from the objects we failed to push as a result of the overflow).
>>> Normally this is not a problem as the serial reference processing
>>> code is executed infrequently - except on systems with one or two
>>> cpus. The parallel reference processing closures reset the marking
>>> state correctly in the event of an overflow in the global marking
>>> Kitchensink on 1 and 2 cpu systems with a very low mark stack size
>>> and marking verification.
More information about the hotspot-gc-dev