RFR (S): 8182703: Correct G1 barrier queue lock orderings
erik.osterlund at oracle.com
Mon Jun 26 13:34:22 UTC 2017
The G1 barrier queues have very awkward lock orderings for the following
1) These queues may queue up things when performing a reference write or
resolving a jweak (intentionally or just happened to be jweak, even
though it looks like a jobject), which can happen in a lot of places in
the code. We resolve JNIHandles while holding special locks in many
places. We perform reference writes also in many places. Now the
unsuspecting hotspot developer might think that it is okay to resolve a
JNIHandle or perform a reference write while possibly holding a special
lock. But no. In some cases, object writes have been moved out of locks
and replaced with lock-free CAS, only to dodge the G1 write barrier
locks. I don't think the G1 lock ordering issues should shape the shared
code rather than the other way around.
2) There is an issue that the shared queue locks have a "special" rank,
which is below the lock ranks used by the cbl monitor and free list
monitor. This leads to an issue when these locks have to be taken while
holding the shared queue locks. The current solution is to drop the
shared queue locks temporarily, introducing nasty data races. These
races are guarded, but the whole race seems very unnecessary.
I argue that if the G1 write barrier queue locks were simply set
appropriately in the first place by analyzing what ranks they should
have, none of the above issues would exist. Therefore I propose this new
Specifically, I recognize that locks required for performing memory
accesses and resolving JNIHandles are more special than the "special"
rank. Therefore, this change introduces a new lock ordering category
called "access", which is to be used by barriers required to perform
memory accesses. In other words, by recognizing the rank is more special
than "special", we can remove "special" code to walk around making its
rank more "special". That seems desirable to me. The access locks need
to comply to the same constraints as the special locks: they may not
perform safepoint checks.
The old lock ranks were:
SATB_Q_CBL_mon: leaf - 1
Shared_SATB_Q_lock: leaf - 1
DirtyCardQ_CBL_mon: leaf - 1
Shared_DirtyCardQ_lock: leaf - 1
The new lock ranks are:
SATB_Q_FL_lock: access (special - 2)
SATB_Q_CBL_mon: access (special - 2)
Shared_SATB_Q_lock: access + 1 (special - 1)
DirtyCardQ_FL_lock: access (special - 2)
DirtyCardQ_CBL_mon: access (special - 2)
Shared_DirtyCardQ_lock: access + 1 (special - 1)
Each PtrQueue and PtrQueueSet group, SATB or DirtyCardQ have the same
group of locks. The free list lock, the completed buffer list monitor
and the shared queue lock.
1) The free list lock and completed buffer list monitors (members of
PtrQueueSet) are disjoint. We never hold both of them at the same time.
Rationale: The free list lock is only used from
PtrQueueSet::allocate_buffer, PtrQueueSet::deallocate_buffer and
PtrQueueSet::reduce_free_list, and no callsite from there can be
expanded where the cbl monitor is acquired. So therefore it is
impossible to acquire the cbl monitor while holding the free list lock.
The opposite case of acquiring the free list lock while holding the cbl
monitor is also not possible; only the following places acquire the cbl
SATBMarkQueueSet::abandon_partial_marking. Again, neither of these paths
where the cbl monitor is held can expand callsites to a place where the
free list locks are held. Therefore it holds that the cbl monitor can
not be held while the free list lock is held, and the free list lock can
not be held while the cbl monitor is held. Therefore they are held
2) We might hold the shared queue locks before acquiring the completed
buffer list monitor. (today we drop the shared queue lock then and
reacquire it later as a hack as already described)
3) We do not acquire a shared queue lock while holding the free list
lock or completed buffer list monitor, as there is no reference from a
PtrQueueSet to its shared queue, so those code paths do not know how to
reach the shared PtrQueue to acquire its lock. The derived classes are
exceptions but they never use the shared queue lock while holding the
completed buffer list monitor or free list lock. DirtyCardQueueSet uses
the shared queue for concatenating logs (in a safepoint without holding
those locks). The SATBMarkQueueSet uses the shared queue for filtering
the buffers, fiddling with activeness, printing and resetting, all
without grabbing any locks.
4) We do not acquire any other lock (above event) while holding the free
list lock or completed buffer list monitors. This was discovered by
manually expanding the call graphs from where these two locks are held.
a) Because of observation 1, the free list lock and completed buffer
list monitors can have the same rank.
b) Because of observations 1 and 2, the shared queue lock ought to have
a rank higher than the ranks of the free list lock and the completed
buffer list monitors (not the case today).
c) Because of of observation 3 and 2, the free list lock and completed
buffer list monitors ought to have a rank lower than the rank of the
shared queue lock.
d) Because of observation 4 (and constraints a-c), all the barrier locks
should be below the "special" rank without violating any existing ranks.
The proposed new lock ranks conform to the constraints derived from my
observations. It is worth noting that the potential relationship that
could break (and why they do not) are:
1) If a lock is acquired from within the barriers that does not involve
the shared queue lock, the free list lock or the completed buffer list
monitor, we have now inverted their relationship as that other lock
would probably have a rank higher than or equal to "special". But due to
observation 4, there are no such cases.
2) The relationship between the shared queue lock and the completed
buffer list monitor has been changed so both can be held at the same
time if the shared queue lock is acquired first (which it is). This is
arguably the way it should have been from the first place, and the old
solution had ugly hacks where we would drop the shared queue lock to not
run into the lock order assert (and only not to run into the lock order
assert, i.e. not to avoid potential deadlock) to ensure the locks are
not held at the same time. That code has now been removed, so that the
shared queue lock is still held when enqueueing completed buffers (no
dodgy dropping and reclaiming), and the code for handling the races due
to multiple concurrent enqueuers has also been removed and replaced with
an assertion that there simply should not be multiple concurrent
enqueuers. Since the shared queue lock is now held throughout the whole
operation, there will be no concurrent enqueuers.
3) The completed buffer list monitor used to have a higher rank than the
free list lock. Now they have the same. Therefore, they could previously
allow them to be held at the same time if the cbl monitor was acquired
first. However, as discussed, there is no such case, and they ought to
have the same rank not to confuse their true disjointness. If anyone
insists we do not break this relationship despite the true disjointness,
I could consent to adding another access lock rank, like this:
http://cr.openjdk.java.net/~eosterlund/8182703/webrev.01/ but I think it
seems better to have the same rank since they are actually truly
disjoint and should remain disjoint.
I do recognize that long term, we *might* want a lock-free solution or
something (not saying we do or do not). But until then, the ranks ought
to be corrected so that they do not cause these problems causing
everyone to bash their head against the awkward G1 lock ranks throughout
the code and make hacks around it.
Testing: JPRT with hotspot all and lots of local testing.
More information about the hotspot-gc-dev