Optimizing card table scanning in CMS collector

Alexey Ragozin alexey.ragozin at gmail.com
Wed Jul 6 11:13:02 UTC 2011

I have done few experiments to analyze cost factors affecting pause duration
of young GC.
Here some interesting results:
It turns out that ClearNoncleanCardWrapper::do_MemRegion method is a severe
Current implementation of this method scan card table byte by byte which
takes too many CPU cycles. Normally majority of cards are clean, so I have
added fast path to this method which is testing whole row of 8 bytes. Test
have shown rogthly 8 times reduction in card table scan time from this
optimization on serial collector.
On CMS ParNew collector I have to increase stride size
-XX:ParGCCardsPerStrideChunk=4096)to see effect.

Modified code of method (cardTableRS.cpp)

void ClearNoncleanCardWrapper::do_MemRegion(MemRegion mr) {
  assert(mr.word_size() > 0, "Error");
  assert(_ct->is_aligned(mr.start()), "mr.start() should be card aligned");
  // mr.end() may not necessarily be card aligned.
  jbyte* cur_entry = _ct->byte_for(mr.last());
  const jbyte* limit = _ct->byte_for(mr.start());
  HeapWord* end_of_non_clean = mr.end();
  HeapWord* start_of_non_clean = end_of_non_clean;
  while (cur_entry >= limit) {
    HeapWord* cur_hw = _ct->addr_for(cur_entry);
    if ((*cur_entry != CardTableRS::clean_card_val()) &&
clear_card(cur_entry)) {
      // Continue the dirty range by opening the
      // dirty window one card to the left.
      start_of_non_clean = cur_hw;

    } else {
      // We hit a "clean" card; process any non-empty
      // "dirty" range accumulated so far.
      if (start_of_non_clean < end_of_non_clean) {
        const MemRegion mrd(start_of_non_clean, end_of_non_clean);

      // fast forward via continuous range of clean cards
      // hardcoded 64 bit version
      if ((((jlong)cur_entry) & 7) == 0) {
          jbyte* cur_row = cur_entry - 8;
          while(cur_row >= limit) {
            if (*((jlong*)cur_row) == ((jlong)-1) /* hardcoded row of
8 clean cards */) {
                  cur_row -= 8;
              else {
          cur_entry = cur_row + 7;
          HeapWord* last_hw = _ct->addr_for(cur_row + 8);
          end_of_non_clean = last_hw;
          start_of_non_clean = last_hw;
      else {
          // Reset the dirty window, while continuing to look
          // for the next dirty card that will start a
          // new dirty window.
          end_of_non_clean = cur_hw;
          start_of_non_clean = cur_hw;
    // Note that "cur_entry" leads "start_of_non_clean" in
    // its leftward excursion after this point
    // in the loop and, when we hit the left end of "mr",
    // will point off of the left end of the card-table
    // for "mr".
  // If the first card of "mr" was dirty, we will have
  // been left with a dirty window, co-initial with "mr",
  // which we now process.
  if (start_of_non_clean < end_of_non_clean) {
    const MemRegion mrd(start_of_non_clean, end_of_non_clean);

Some more information about testing and test result are available here

On my real application effect of this patch was 2.5 reduction of average GC
pause duration for 28GiB heap size. I really hope to see that kind of
improvement in main stream JDK soon.

Thank you

On Wed, Jun 15, 2011 at 12:03 PM, Alexey Ragozin
<alexey.ragozin at gmail.com>wrote:

> Hi,
> Recently I was analyzing CMS  GC pause times on JVM with 32Gb of heap
> (using Oracle Coherence node as sample application). It seems like young
> collection pause time is totally dominated by time required to scan card
> table (I suppose size of table should be 64Mb in this case). I believe time
> to scan card table could be cut significantly at price of slightly more
> complex write-barrier. By introducing super-cards collector can avoid
> scanning whole ranges of card table. I would like to implement POC to prove
> reduction of young collection pause (also it should probably reduce CMS
> remark pause time).
> I need an advice to locate right places for modification in code base (I’m
> not familiar with it). I thing I can ignore JIT for sake of POC (running JVM
> in interpreter mode). So I need to modify write barrier used in interpreter
> and card table scanning procedure.
> Thank you for advice.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20110706/52a82572/attachment.htm>

More information about the hotspot-gc-dev mailing list