RFR(M) JDK-8014555: G1: Memory ordering problem with Conc refinement and card marking

David Holmes david.holmes at oracle.com
Fri Oct 4 02:15:04 PDT 2013

Hi Mikael,


I don't quite follow the logic for this:

           __ set(rs, cardtable);         // cardtable := <card table base>
           __ ldub(addr, cardtable, tmp); // tmp := [addr + cardtable]

+         __ cmp_and_br_short(tmp, 
G1SATBCardTableModRefBS::g1_young_card_val(), Assembler::equal, 
Assembler::pt, young_card);
+         __ membar(Assembler::Membar_mask_bits(Assembler::StoreLoad));
+         __ ldub(addr, cardtable, tmp); // tmp := [addr + cardtable]

You load the cardtable with no barrier, checks its value and if it seems 
"wrong" (?) you then issue the storeload barrier and try again. Why not 
just insert the storeload barrier between the store and the load?

Similar comment for src/cpu/sparc/vm/macroAssembler_sparc.cpp.



   G1SATBCardTableLoggingModRefBS::write_ref_field_work(void* field,
                                                        oop new_val) {
!   volatile jbyte* byte = byte_for(field);
!   if (*byte == g1_young_gen) {
!     return;
!   }
!   OrderAccess::storeload();

where is the store that you are ordering with?


!       for (; byte <= last_byte; byte++) {
!         if (*byte == g1_young_gen) {
!           continue;
!         }
!         OrderAccess::storeload();
           if (*byte != dirty_card) {
             *byte = dirty_card;

the store seems to be after the barrier ?


On 2/10/2013 10:28 PM, Mikael Gerdin wrote:
> Hi
> Please review my fix for the issue discussed in the "G1 question:
> concurrent cleaning of dirty cards" thread on the hotspot-gc-dev mailing
> list.
> I'd like someone from the compiler (and runtime? the interpreter uses
> macroAssembler_*, right?) teams to at least look at the changes to:
> macroAssembler_*.cpp
> c1_Runtime_*.cpp
> graphKit.cpp
> Problem description:
> G1 has a race where the concurrent refinement thread may miss object
> references in a dirty card.
> The problem arises if the CPU re-orders the load of the old card value
> (which G1 checks to determine if it can skip the barrier)
> before the store to the actual object.
> If that occurs the concurrent refinement thread may have set the card to
> "clean" and proceeded to scan the card but the java thread may have seen
> the "dirty" value and skipped the post barrier.
> Suggested fix:
> * Add a memory fence between the store to a java object and the reading
> of the previous card value.
> * Modify the code for handling young regions so that all writes to young
> regions can skip the fence (since it will never be needed for such
> writes). This introduces a new value in the card table for G1 which
> indicates a young region.
> Performance impact:
> * This fix has a negative throughput performance impact of 1-1.5%
> (tested on x86-AMD x86-Intel and SPARC).
> * We may want to try to get rid of this race at some point by
> redesigning G1's post barrier but there is not enough time to do that
> for JDK8.
> Performance numbers for x86 platforms can be seen here:
> http://cr.openjdk.java.net/~mgerdin/8014555/perf.txt
> Unfortunately the JIRA issue is not externally visible, but the major
> parts of the discussions about this are present in the mailing list
> thread. The bug mostly contains my analysis of the crashes which seems
> to have been caused by this bug.
> Bug link: https://bugs.openjdk.java.net/browse/JDK-8014555
> Webrev: http://cr.openjdk.java.net/~mgerdin/8014555/webrev.0

More information about the hotspot-dev mailing list