RFR: 8277434: tests fail with "assert(is_forwarded()) failed: only decode when actually forwarded"
Albert Mingkun Yang
ayang at openjdk.java.net
Mon Nov 29 08:11:03 UTC 2021
On Sun, 28 Nov 2021 03:30:25 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:
> Please review this change which fixes a rare assert failure in
> G1ParScanThreadState::write_ref_field_post. The problem is that the assert is
> referencing values from the forwardee, and that isn't safe. When an object is
> copied and forwarded to the copy, the forwarding is performed with a relaxed
> cmpxchg. This means the data being copied into the forwardee might not be
> visible to a different thread that has obtained the forwardee. All uses of a
> forwardee must take care to avoid reading from it without performing
> additional synchronization. The assert in question violates that restriction.
> The incorrect assertion is
> assert(dest_attr.is_in_cset() == (obj->is_forwarded() && obj->forwardee() == obj), "...");
> obj is the forwardee of a different object.
> It's okay here to examine the contents of obj when it's in the cset. In
> that case it's not a copy; it's the original object, self-forwarded due to
> evacuation failure.
> But it's not okay to examine the contents of obj when it's not in the cset,
> e.g. when it's a forwardee copy of the original object. Doing so violates
> the above restriction.
> So that tells us this code is wrong, but it's still not obvious this is the
> cause of the failure in question. The failure we're seeing appears to be
> that obj->is_forwarded() returns true, but the assert of the same thing in
> forwardee returns false. But is_forwarded() should never be true for a
> copied forwardee. But it turns out a copied forwardee might temporarily
> appear to be forwarded, due to the unordered copy + forward sequence.
> Consider the following scenario:
> A thread is evacuating an object and allocates a candidate forwardee, copies
> into it, and finally discovers the forwarding race was lost even before the
> copy. The candidate forwardee appears to be marked as forwarded. That
> candidate is deallocated and the real forwardee is used by the thread.
> That same thread evacuates a second object, reallocating some of the same
> memory as from the previous candidate. It copies the second object into the
> new forwardee, successfully performs the forwarding, and uses that newly
> created forwardee.
> A second thread attempts to evacuate that same second object, and finds it
> already forwarded. It goes into write_ref_field_post, reaches the assert,
> and attempts to use values from the forwardee. Because the copy and the
> forwarding by the first thread were unordered, this second thread might see
> the old copy that appears to be forwarded, rather than the up to date copy.
> So the first call to is_forwarded() returns true. The forwardee() assert of
> the same thing might instead get the up to date contents, resulting in the
> reported failure. Alternatively, it might still get old data, in which case
> nothing gets noticed because the self-forward check will fail and the
> write_ref_field_post assert will (accidentally) succeed.
> So there's a very small window in which the reported failure can occur. But
> it's all caused by the write_ref_field_post assert performing invalid
> accesses. Stop doing that and the problem should go away.
> I think this failure couldn't have happened before JDK-8271579, but that's
> kind of accidental and doesn't change that the write_ref_field_post assert
> was performing invalid accesses. (Those accesses were introduced later.)
> mach5 tier1-5.
> Ran applications/jcstress/acqrel.java 100 times without failure.
> Without the change I had a 25-30% failure rate for that test.
> I don't know why that test was such a good canary, when this failure seems to
> otherwise be pretty rare, but glad that it was.
Thank you for the detailed analysis; this makes perfect sense.
Marked as reviewed by ayang (Reviewer).
More information about the hotspot-gc-dev