RFR: 8277434: tests fail with "assert(is_forwarded()) failed: only decode when actually forwarded" [v2]
kbarrett at openjdk.java.net
Tue Nov 30 20:26:44 UTC 2021
> Please review this change which fixes a rare assert failure in
> G1ParScanThreadState::write_ref_field_post. The problem is that the assert is
> referencing values from the forwardee, and that isn't safe. When an object is
> copied and forwarded to the copy, the forwarding is performed with a relaxed
> cmpxchg. This means the data being copied into the forwardee might not be
> visible to a different thread that has obtained the forwardee. All uses of a
> forwardee must take care to avoid reading from it without performing
> additional synchronization. The assert in question violates that restriction.
> The incorrect assertion is
> assert(dest_attr.is_in_cset() == (obj->is_forwarded() && obj->forwardee() == obj), "...");
> obj is the forwardee of a different object.
> It's okay here to examine the contents of obj when it's in the cset. In
> that case it's not a copy; it's the original object, self-forwarded due to
> evacuation failure.
> But it's not okay to examine the contents of obj when it's not in the cset,
> e.g. when it's a forwardee copy of the original object. Doing so violates
> the above restriction.
> So that tells us this code is wrong, but it's still not obvious this is the
> cause of the failure in question. The failure we're seeing appears to be
> that obj->is_forwarded() returns true, but the assert of the same thing in
> forwardee returns false. But is_forwarded() should never be true for a
> copied forwardee. But it turns out a copied forwardee might temporarily
> appear to be forwarded, due to the unordered copy + forward sequence.
> Consider the following scenario:
> A thread is evacuating an object and allocates a candidate forwardee, copies
> into it, and finally discovers the forwarding race was lost even before the
> copy. The candidate forwardee appears to be marked as forwarded. That
> candidate is deallocated and the real forwardee is used by the thread.
> That same thread evacuates a second object, reallocating some of the same
> memory as from the previous candidate. It copies the second object into the
> new forwardee, successfully performs the forwarding, and uses that newly
> created forwardee.
> A second thread attempts to evacuate that same second object, and finds it
> already forwarded. It goes into write_ref_field_post, reaches the assert,
> and attempts to use values from the forwardee. Because the copy and the
> forwarding by the first thread were unordered, this second thread might see
> the old copy that appears to be forwarded, rather than the up to date copy.
> So the first call to is_forwarded() returns true. The forwardee() assert of
> the same thing might instead get the up to date contents, resulting in the
> reported failure. Alternatively, it might still get old data, in which case
> nothing gets noticed because the self-forward check will fail and the
> write_ref_field_post assert will (accidentally) succeed.
> So there's a very small window in which the reported failure can occur. But
> it's all caused by the write_ref_field_post assert performing invalid
> accesses. Stop doing that and the problem should go away.
> I think this failure couldn't have happened before JDK-8271579, but that's
> kind of accidental and doesn't change that the write_ref_field_post assert
> was performing invalid accesses. (Those accesses were introduced later.)
> mach5 tier1-5.
> Ran applications/jcstress/acqrel.java 100 times without failure.
> Without the change I had a 25-30% failure rate for that test.
> I don't know why that test was such a good canary, when this failure seems to
> otherwise be pretty rare, but glad that it was.
Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
- Merge branch 'master' into fix_forwarding_assert
- remove test from problemlist
- avoid assert based on possibly bad data
- all: https://git.openjdk.java.net/jdk/pull/6582/files
- new: https://git.openjdk.java.net/jdk/pull/6582/files/25daf289..4b353c4f
- full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6582&range=01
- incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6582&range=00-01
Stats: 1461 lines in 73 files changed: 606 ins; 581 del; 274 mod
Fetch: git fetch https://git.openjdk.java.net/jdk pull/6582/head:pull/6582
More information about the hotspot-gc-dev