Fwd: (resend) Request for review (S): 7068625 Testing 8 bytes of card table entries at a time speeds up card-scanning

John Coomes John.Coomes at oracle.com
Mon Feb 27 18:48:57 PST 2012


Bengt Rutisson (bengt.rutisson at oracle.com) wrote:
> 
> Hi all,
> 
> Just pinging this review request. Does anybody have some time to look at 
> it? It is a fairly small and straight forward change...

Semantically looks ok.  Some style issues, though.

cardTableRS.cpp:
---------------

I'd rather see the loop on 207 rewritten as:

 207         while (cur_row >= limit &&
 208                *((intptr_t*)cur_row) == CardTableRS::clean_card_row()) {
 209           cur_row -= BytesPerWord;
 210         }

as it's shorter and eliminates the 'break'.

There are also whitespace problems (inconsistent indentation, 4 spaces
instead of 2, no space after while, etc.).

The comment that starts with 'Reset the dirty window' is now in only
one of the places that does the reset.  And this comment:

    // Note that "cur_entry" leads "start_of_non_clean" in
    // its leftward excursion after this point
    // in the loop and, when we hit the left end of "mr",
    // will point off of the left end of the card-table
    // for "mr".
  }

is in an odd position (no code follows it) and the 'after' should be
changed to 'at'.

But instead of changing the comments, the new behavior can be achieved
with a single new hunk of code in do_MemRegion that keeps the comments
relevant:

--- cardTableRS.cpp.org
+++ cardTableRS.cpp
@@ -17,10 +17,22 @@
       // "dirty" range accumulated so far.
       if (start_of_non_clean < end_of_non_clean) {
         const MemRegion mrd(start_of_non_clean, end_of_non_clean);
         _dirty_card_closure->do_MemRegion(mrd);
       }
+
+      // fast forward through potential continuous range of clean cards
+      if (is_word_aligned(cur_entry)) {
+        jbyte* cur_row = cur_entry - BytesPerWord;
+        while (cur_row >= limit &&
+               *((intptr_t*)cur_row) == CardTableRS::clean_card_row()) {
+          cur_row -= BytesPerWord;
+        }
+        cur_entry = cur_row + BytesPerWord;
+        cur_hw = _ct->addr_for(cur_row + BytesPerWord);
+      }
+
       // Reset the dirty window, while continuing to look
       // for the next dirty card that will start a
       // new dirty window.
       end_of_non_clean = cur_hw;
       start_of_non_clean = cur_hw;

I think it's easier to follow.

cardTableRS.hpp:
---------------

  48   // making this constants into inline methods kills performance for some reason
  49   static intptr_t clean_card_row() {
  50     return CardTableModRefBS::clean_card_row;
  51   }

The comment seems to warn against the very thing the code is doing.
If the comment is accurate, it's fine to eliminate the method and use
the enum directly.

  53   static bool  card_is_dirty_wrt_gen_iter(jbyte cv) {
  54     return CardTableModRefBS::card_is_dirty_wrt_gen_iter(cv);
 
This whitespace change looks spurios.

-John

> -------- Original Message --------
> Subject: 	Request for review (S): 7068625 Testing 8 bytes of card table 
> entries at a time speeds up card-scanning
> Date: 	Tue, 21 Feb 2012 12:03:50 +0400
> From: 	Alexey Ragozin <alexey.ragozin at gmail.com>
> To: 	hotspot-gc-dev at openjdk.java.net
> CC: 	Bengt Rutisson <bengt.rutisson at oracle.com>
> 
> 
> 
> Hi,
> 
> I would like few volunteers to review changes for 
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7068625
> WebRev: http://cr.openjdk.java.net/~brutisso/7068625/webrev.00/ 
> <http://cr.openjdk.java.net/%7Ebrutisso/7068625/webrev.00/>
> 
> Change summary
> For large heaps (I was focusing on 8GiB and above) it is common to have 
> long continuous ranges of clean cards.
> Patch is introducing a short path for skipping ranges of clean cards 
> using word aligned memory access instead of byte aligned.
> 
> Patch affects serial and CMS collectors. For CMS collector stride size 
> should be increase to see any performance gains (I was using
> -XX:+UnlockDiagnosticVMOptions
> -XX:ParGCCardsPerStrideChunk=4096)
> 
> For testing I was mainly using synthetic benchmark randomly modifying 
> hash tables in heap, thus uniformly touching cards across heaps.
> Average duration of young GC pause were used as KPI.
> More details about testing can be found at
> http://blog.ragozin.info/2011/07/openjdk-patch-cutting-down-gc-pause.html
> Though article is referring jdk6, my resent tests with trunk jdk7 show 
> no difference.
> I was also tested patch with real application (Oracle Coherence storage 
> node).
> With 16GiB of heap and CMS/ParNew GC, enabling patch have shortened GC 
> pauses roughly in 2 times.
> 
> Source code of benchmark used in test are available at
> https://gridkit.googlecode.com/svn/branches/aragozin-sandbox/young-gc-bench
> Main class YoungGCPauseBenchmark
> 
> Regards,
> Alexey
> 


More information about the hotspot-gc-dev mailing list