CRR (L): 6888336: G1: avoid explicitly marking and pushing objects in survivor spaces
tony.printezis at oracle.com
Wed Dec 21 22:37:44 UTC 2011
I'd like a couple of code reviews for the following non-trivial changes
(large, not necessary in lines of code modified but more due to the fact
that the evacuation pause / concurrent marking interaction is changed
Here's some background, motivation, and a summary of the changes (I felt
that it was important to write a longer then usual explanation):
* Background / Motivation
Each G1 heap region has a field top-at-mark-start (aka TAMS) which
denotes where the top of the region was when marking started. An object
is considered implicitly live if it's over TAMS (i.e., it was allocated
since marking started) or explicitly live if it's below TAMS (i.e., it
was allocated before marking started) and marked on the bitmap. (It
follows that it's unnecessary to explicitly mark objects over TAMS.)
In fact, we have two copies of the above marking information: "Next TAMS
/ Next Bitmap" and "Prev TAMS / Prev Bitmap". Prev is the copy that was
obtained by the last marking cycle that was successfully completed (so,
it is consistent: all live objects should appear as live in the prev
marking information). Next is the copy that will be obtained / is
currently being obtained and it's not consistent because it's not
guaranteed to be complete.
G1 uses SATB marking which has the advantage not to require objects
allocated since the start of marking to be visited at all by the marking
threads (they are implicitly live and they do not need to be scanned).
So, the active marking cycle can totally ignore objects over NTAMS
(since they have been allocated since marking started).
The current interaction between evacuation pauses (let's call these
"GCs" from now on) and concurrent marking is very tricky. Even though
marking ignores all objects over NTAMS (currently: all objects in Eden
regions) it still has to visit and mark objects in the Survivors
regions. But those will be moved by subsequent GCs. So, a GC needs to be
aware that it's moving objects that have been marked by the marking
threads and not only propagate those marks but also notify the marking
threads that said objects have been moved. For that we use several data
structures: pushes to the global marking stack and also to what's
referred to as the "region stack" which is only used by the GC to push a
group of objects instead of pushing them individually ("region" here is
a mem region and smaller than a G1 region).
Additionally, because the marking threads could come across objects that
could potentially move we have to make sure that we don't leave
references to regions that have been evacuated on any marking data
structure. To do that we treat as roots all entries on the taskqueues /
global stack and drained all SATB buffers (both active buffers and also
The first issue with the above interaction is that it has performance
issues. Draining all SATB buffers and scanning the mark stack and
taskqueues has been shown to be very time-consuming in some cases. Also,
having to check whether objects are marked and propagate the marks
appropriately during GC is an extra overhead.
The second issue is that it has been shown to be very fragile. We have
discovered and fixed many issues over time which were subtle and hard to
We really need to simplify the GC/marking interaction to both improve
performance of GCs during marking, as well as improve our reliability.
This changeset does exactly that.
* Explanation of the changes
The goal is to ensure that all the objects that are copied by the GC do
not need to be visited by the marking threads and as a result do not
need to be explicitly marked, pushed, etc.
The first observation is that most objects copied during a GC are
allocated after marking starts and are therefore implicitly live. This
is the case for all objects on Eden regions, as well as most objects on
Survivor regions. The only exception are objects on the Survivor regions
during the initial-mark pause. Unfortunately, it's not easy to track
those separately as they will get mixed in with future Survivors. The
first decision to deal with this is to turn off Survivors during the
initial-mark pause. This ensures that all objects copied during each
subsequent GC will only visit objects that have been allocated since
marking started and are therefore implicitly live (i.e., over NTAMS).
This allows us to totally eliminate that code that propagates marks
during the GC. We just have to make sure that all copied objects are
over NTAMS. Turning off Survivors during an initial-mark pause is a bit
of a "big hammer" approach, but it will suffice for now. We have ideas
on how to re-enable them in the future and we'll explore a couple of
Given that the GC only copies objects that are implicitly marked it
follows that none of the objects that are copied during any GC should
appear on either the taskqueues nor the global marking stack. Also
remember that we filter SATB buffers before enqueueing them which will
filter out all implicitly marked objects. It follows that no enqueued
SATB buffer should have references to objects that are being moved. This
leaves the currently active SATB buffers given that the code that
populates them is unconditional. But if we run the filtering on those
during each GC such "offending" references are also quickly eliminated.
So, instead of having to scan all stacks and all SATB buffers we only
have to filter the active SATB buffers, which should be much, much faster.
* Implementation Notes
The actual changes are not too extensive as they basically mostly
disable functionality in the GC code. The tricky part was to get the
TAMS fields correct at various phases (start of copying, start of
marking, etc.) and especially when an evacuation failure occurs. I put
all that functionality in methods on HeapRegion which do the right thing
when a GC starts, a marking starts, etc.
The most important changes are in the "main" GC code, i.e.
G1ParCopyHelper::copy_to_survivor_space(). Instead of having to
propagate marks we only now need to mark objects directly reachable from
roots during the initial-mark pause. The resulting code is much
simplified (and hopefully more performant!).
I also added a method verify_no_cset_oops() which checks that indeed all
the marking data structures do not point to regions that are being GCed
at the start / end of each GC. (BTW, I'm considering adding a develop
flag to enable this on demand.)
I should point out that this changeset will leave a lot of dead code.
However, I took the decision to keep the changes to a minimum in order
not overwhelm the code reviewers and make the important changes clearer.
(I also discussed this with a couple of potential code reviewers and
they agreed that this is a good approach.) I temporarily added
guarantees to ensure that methods that should not be called are not
called. I will remove all dead code with a future push.
I also have to apologize to John Cuthbertson for removing a lot of code
he's added to deal with various bugs we had in the GC/marking
interaction. Hopefully the new code will be less fragile compared to
what we've had so far and John will be able to concentrate on more
interesting features than trying to track down hard-to-reproduce failures!
More information about the hotspot-gc-dev