RFR: 8205577: parallel/TestPrintGCDetailsVerbose.java fails assertion
Daniel D. Daugherty
daniel.daugherty at oracle.com
Wed Jun 27 13:15:27 UTC 2018
Wonderful analysis Kim!
(old) L1694: (*p)->print_value_on(&ls);
I agree with not printing, but you might want to leave a
comment behind. Something like:
// Do not try "(*p)->print_value_on()" here because it
// is racy with parallel operations.
Thumbs up in any case.
On 6/27/18 1:03 AM, Kim Barrett wrote:
> Please review this fix of an assertion failure during ParallelGC young
> We have a task (thread) walking the code cache, copying and forwarding
> oop references in those cached nmethods containing scavengable
> (e.g. young) oops. After the oops in an nmethod are processed, a
> second pass is made to determine whether any of the oops are still
> scavengable, removing the nmethod from the list containing scavengable
> oops. That speeds up future collection cycles.
> That second pass check logs the values of any scavengable oops
> referred to by the nmethod. This is where things go wrong. If the
> object being printed is a j.l.Class, deep in the guts of the printing
> there is an assert that the mirror of the klass of the object is that
> object. But if the klass mirror has not yet been forwarded, we can be
> comparing a forward object with the not yet forward version, and the
> assertion fails.
> There is another task that is running in parallel with the code cache
> walk that is processing the CLD graph, but it may not have reached the
> relevant klass yet, so the klass's mirror hasn't been forwarded yet.
> But there's another complication that makes this crash even more
> difficult to trigger. The ParallelGC young collection's code cache
> walk promotes nmethod-referenced oops, and that promotion makes the
> object no longer scavengable. So to trigger the crash, some third
> thread must have copied a young class to survivor space, then the code
> cache walk processes an nmethod referring to that class and prints the
> still young class, all before the CLD graph walk has forwarded the
> klass's mirror.
> This problem has been lurking all along, but JDK-8203837 changed the
> printing from only occurring when TraceScavenge was true to printing
> when gc+nmethod=trace logging is enabled. As there were no tests that
> enabled TraceScavenge, we didn't previously notice the problem. But
> there are tests that enable gc+nmethod=trace, enabling the problematic
> The solution being taken here is to change that logging to no longer
> attempt to print the object. Printing an object while in the middle
> of a copying GC, especially one that is parallel, seems quite risky,
> so let's just not do that.
> mach5 tier1,2,3
> I haven't been able to reproduce the failure; as discussed above, it's
> very timing sensitive. But the scenario described above accounts for
> both the problem and the timing of its appearance.
More information about the hotspot-dev