RFR(s): 8171449" [aarch64] store_klass needs to use store release

Thomas Schatzl thomas.schatzl at oracle.com
Fri Dec 23 12:31:54 UTC 2016

Hi Andrew,

On Wed, 2016-12-21 at 16:02 +0000, Andrew Haley wrote:
> On 21/12/16 16:00, Thomas Schatzl wrote:
> > 
> > That is what confused me, because the original context to me seemed
> > to talk about G1. So my answer is applicable to G1 only, sorry.
> > 
> > I will see if I can find out how CMS works (or is supposed to) in
> > that regard.
> Thanks.  This is giving me a headache.  :-)

After some digging, I think CMS is good. Here is an attempt to explain
what happens:

First, the allocation and the CMS concurrent threads always synchronize
using a global lock (_markBitMap.lock()).

The concurrent threads (marking, precleaning, sweeping) skip over yet
completely uninitialized blocks (that may have garbage in the _klass)
using marks in the bitmap: if an object has been allocated during
concurrent phases, the object is marked gray using three marks: one at
the start, one at the end, and one at the second word. Using that bit
in the second word it knows that the object inside the start and end
bits may not have been initialized yet (but its klass is either NULL or
valid due to locking, see below). CMS then uses these start/end bits to
know the length of that block, or if that second bit does not exist,
knows that the next bit on the mark bitmap must be a valid start of an

Pointer writes to an object after allocation (when the klass must have
been valid) will dirty the corresponding card and the concurrent gc
threads will visit the object (then fully initialized) again.

In detail, an allocating thread does something like:

set klass to NULL
  [ConcurrentMarkSweepGeneration::have_lock_and_allocate() calls
CompactibleFreeListSpace::allocate() which sets the klass to NULL
without any synchronization yet; the klass pointer is overlaid over
FreeChunk::_prev which is set to NULL; the concurrent threads will just
skip that block using the mark bitmap.]

lock _markBitMap.lock()
set first, second and last bits of that range in the bitmap
unlock _markBitmap.lock() [makes sure that Printezis bits and klass
field are visible to concurrent threads; before that concurrent threads
do not touch this area. JDK-8160369 fixed some issues with reading the
klass field if/when CMS reads from that area]

The concurrent readers always do something like:

lock _markBitmap.lock();
while (more work) {
  do some work;
  yield [unlock _markBitMap.lock(), sleep, _markBitMap.lock()]
unlock _markBitmap.lock();
I.e. the shared mark bitmap information seems to be properly
synchronized and always usable to avoid reading junk data.

There is some documentation about the use of mark bits in
CMSCollector::direct_allocated(). Also, there is some more
documentation on how the synchronization works above
SweepClosure::do_blk_careful for the sweeper. The others are similar
from what I saw.

Check the uses of _markBitMap.lock() to find the respective yield
methods for the various kinds of concurrent mark threads.

Overall, given that TLABs are only allocated in young, so the old gen
allocation always uses the runtime, which I briefly rechecked to use
the proper synchronization to read the klass values, I think there is
no issue.

I hope this answers your question.

Given this I tend to think that there is no issue here after all, but
please by all means, verify this.


Merry Christmas and a Happy New Year!

More information about the hotspot-runtime-dev mailing list