request for review (M): 6883834: ParNew: assert(!_g->to()->is_in_reserved(obj), "Scanning field twice?") with LargeObjects tests
Y. Srinivas Ramakrishna
y.s.ramakrishna at oracle.com
Thu May 5 01:23:32 PDT 2011
6883834: ParNew: assert(!_g->to()->is_in_reserved(obj),"Scanning field twice?") with LargeObjects tests
Investigation of this crash led to several avenues
towards fixing the problem, including a
more scalable and performant rewrite of the parallelization
of card scanning which would have fixed the bug as well,
and which was indeed the direction in which I had been
initially headed. However, I realized last week that,
given the upcoming hs21 code freeze, and the subtleties
in the current code, that might not be the most sensible
thing to do at this time. At that point, I decided to fix
only the correctness problem(s) indicated by the assertion
that we hit and not address in any manner the relative
lack of scalability in the current implementation in
the presence of large non-array objects. Somewhat
offsetting the lack of that scalability is, however,
the fact that the situation in which scalability suffers,
namely very large sparsely dirty non-array objects are the exception
in most programs (not so, very large sparsely dirty
array objects which are by far the more common).
The problem in the current code turned out to be
(1) it may scan too little and miss some pointers:
when a large non-array object straddles entirely over a
complete "chunk" (the parallel unit of work) of the card table,
its head is dirty (so we are forced to assume an imprecise
card mark and scan it in its entirety), then
we may miss any part of it that extends past the end
of its second chunk. Currently chunks are fixed at
256 cards == 128K, so any object larger than that
size is potentially vulnerable.
(2) it may scan too much and thus scan some pointers twice:
when a large object straddles onto a subsequent chunk
its head is dirty, and the next card is not dirty,
the intention of the code was that
the thread scanning the chunk in which the head of the
object lay would scan the object in its entirety
and the thread scanning the next chunk would not.
However, there was a flaw in the way card-dirtiness
was being tested for an object straddling into a
chunk from the left and how it was being tested for
an object straddling past the end (or right).
This could, under the right conditions, result
in a suffix of the object in the
second of the two chunks above being scanned twice.
While it was the second scenario that caused the
assertion violation, the first could lead to more
insidious problems with stale pointers left in old
generation or permanent generation objects.
The fix for (1) was to not terminate the search for the
last card to scan to the right not at the end of the
next chunk, but go right to the end of the object in
The fix for (2) was to make sure the correct predicate
was being tested for the card-values in order to ensure
that the thread for the left chunk did not scan beyond
the end of a specific point, as exemplified in the
Along the way, I added a few assertion checks, some verbose logging
(disabled in all builds but which can be enabled via a source
level change: I'd like to leave this in for future debuggability
and code-comprehensibility), elaborated and corrected the documentation
comments and changed the order of filling the "LNC array"
(which is useful to and consulted by leftward neighbouring workers)
and determining the rightward extent of the chunk scan
past the end of the chunk. I also tightened the set of
conditions under which we extend our scan to the right,
which can save some unnecessary card value examination
in some cases.
Note that this is a nearly minimal fix to the correctness
problem in parallel card-scanning with ParNew. A more
extensive rewrite of the code with an eye to simplifying
some of the convolutedness of its current form and of
improving its scalability when there are very large
objects in the heap (or, as in this case, large constant
pools in the perm gen), will be the subject of an RFE
PS: I also pushed some of the closure construction
closer to its use. This will also make it easier to
do implement the performance RFE to follow in hs22.
In the same vein, I changed some defined constants
related to chunk size and parallel chunk strides
into global diagnostic flags. I did leave the current
values unchanged, although it is quite likely that these
values could be tuned some, an activity that will be
part of the aforementioned performance RFE.
A quick performance run indicated no change in performance,
as one might expect given the rarity of very large
non-array objects in most applications. More thorough performance
testing (with reference server refworkload) is in progress.
The jck60001 test that tripped the assert; refworkload; JPRT;
all three with and without heap verification on a variety
of different platforms and with the chunk size varied for
thanks for your reviews!
More information about the hotspot-gc-dev