RFR(S): 8009536: G1: Apache Lucene hang during reference processing

Bengt Rutisson bengt.rutisson at oracle.com
Wed Mar 13 03:22:06 PDT 2013

Hi John,

I'm not too familiar with the do_marking_step() code, so I'm trying to 
get to know it while I'm looking at your webrev. So, maybe I'm 
completely off here, but I think there are a couple of strange things.

After your change do_marking_step() takes three bools: do_stealing, 
do_termination and is_serial.

This means that we have 8 possible states if we combine all value of 
these bools. I went through all calls to do_marking_step() and it seems 
like we make use of 4 of those states:

   do_marking_step(false, false, true)             (1)

   do_marking_step(true, true, false)              (2)

   do_marking_step(true, true, false)              (2)
   do_marking_step(true, true, true)               (3)

   do_marking_step(false, true, true)              (4)
   do_marking_step(true, true, false)              (2)

I numbered the states (1) - (4). Here are all states and I marked which 
ones we use:

  stealing,  termination,  serial    (3)
  stealing, !termination,  serial
  stealing, !termination, !serial
  stealing,  termination, !serial    (2) parallel
!stealing,  termination,  serial    (4)
!stealing, !termination,  serial    (1) serial
!stealing, !termination, !serial
!stealing,  termination, !serial

I think state (2) makes sense as the parallel case. We do stealing and 
termination and we are not serial. State (1) also makes sense. It is 
clearly the serial case. No stealing or termination. Just serial.

Now, case (3) looks odd to me. We say we are serial but we want to do 
stealing and termination. This case comes from CMRemarkTask. It takes a 
is_serial argument in its constructor. We get to case (3) from 
ConcurrentMark::checkpointRootsFinalWork(), where we do:

   if (G1CollectedHeap::use_parallel_gc_threads()) {
   } else {
     CMRemarkTask remarkTask(this, active_workers, true /* is_serial*/);

To me this looks like it should lead to the pure serial case (1) instead 
of (3). Do you agree? In that case I think CMRemarkTask::work() needs to 
be changed to do:

         task->do_marking_step(1000000000.0 /* something very large */,
                               !_is_serial /* do_stealing    */,
                               !_is_serial /* do_termination */,

The other case that sticks out is (4). We don't want to do stealing, but 
we want to do termination even though we are serial. This comes from how 
we set up the g1_drain_mark_stack in ConcurrentMark::weakRefsWork(). We 
pass that along to the reference processing, which I think is serial 
(wasn't that the original problem? :) ). We also pass the g1_keep_alive 
which is set up to be in state (1). So, I wonder if it is correct that 
the g1_drain_mark_stack is in state (4). If you agree that we should be 
using the pure serial case here too, I think we need to change 
G1CMDrainMarkingStackClosure::do_void() to be:

       _task->do_marking_step(1000000000.0 /* something very large */,
                              !_is_serial /* do_stealing    */,
                              !_is_serial /* do_termination */,
                              _is_serial  /* is_serial      */);

Again, I'm not sure about this, but if the above is correct we will be 
left with only two states. The parallel and the serial. In that case I 
think it would make sense to remove the do_stealing and do_termination 
arguments for do_marking_step() and just pass in the is_serial argument. 
Internally do_marking_step() could set up two local variable to track 
stealing and termination. That might be good if we see the need in the 
future to split these up again.


On 3/12/13 11:00 PM, John Cuthbertson wrote:
> Hi Everyone,
> Here's a new webrev based upon comments from Bengt and Thomas.
> http://cr.openjdk.java.net/~johnc/8009536/webrev.1/
> This webrev includes the just the changes to resolve the hang seen by 
> overflowing the marking stack during *serial* reference processing. As 
> I said in my response to Bengt, this revision will produce the 
> following assert:
>> [junit4:junit4] 80.722: [GC remark 80.723: [GC ref-proc80.785: [GC 
>> concurrent-mark-reset-for-overflow]
>> [junit4:junit4] # To suppress the following error report, specify 
>> this argument
>> [junit4:junit4] # after -XX: or in .hotspotrc: 
>> SuppressErrorAt=/concurrentMark.cpp:809
>> [junit4:junit4] #
>> [junit4:junit4] # A fatal error has been detected by the Java Runtime 
>> Environment:
>> [junit4:junit4] #
>> [junit4:junit4] #  Internal Error 
>> (/export/workspaces/8009536_3/src/share/vm/gc_implementation/g1/concurrentMark.cpp:809), 
>> pid=16314, tid=14
>> [junit4:junit4] #  assert(_finger == _heap_end) failed: only way to 
>> get here
>> [junit4:junit4] #
>> [junit4:junit4] # JRE version: Java(TM) SE Runtime Environment 
>> (8.0-b79) (build 1.8.0-ea-fastdebug-b79)
>> [junit4:junit4] # Java VM: Java HotSpot(TM) Server VM 
>> (25.0-b23-internal-jvmg mixed mode solaris-x86 )
>> [junit4:junit4] # Core dump written. Default location: 
>> /export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/build/analysis/uima/test/J0/core 
>> or core.16314
>> [junit4:junit4] #
>> [junit4:junit4] # An error report file with more information is saved 
>> as:
>> [junit4:junit4] # 
>> /export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/build/analysis/uima/test/J0/hs_err_pid16314.log
>> [junit4:junit4] #
>> [junit4:junit4] # If you would like to submit a bug report, please 
>> visit:
>> [junit4:junit4] #   http://bugreport.sun.com/bugreport/crash.jsp
>> [junit4:junit4] #
>> [junit4:junit4] Current thread is 14
>> [junit4:junit4] Dumping core ...
> when run with parallel reference processing enabled. That fix will be 
> sent out shortly.
> JohnC
> On 3/11/2013 2:35 PM, John Cuthbertson wrote:
>> Hi Everyone,
>> Can I have a couple of volunteers review these changes? The webrev 
>> can be found at: http://cr.openjdk.java.net/~johnc/8009536/webrev.0/.
>> First of all - many thanks to Uwe Schindler for discovering an 
>> reporting the problem and providing very clear instructions on how to 
>> reproduce the issue. Many thanks also Dawid Weiss for also stepping 
>> in with a self-contained reproducer.
>> I also wish to thank Bengt for his help. It was Bengt who gave me the 
>> magic proxy formula that allowed Ivy to satisfy and download all the 
>> dependencies for the test case. Bengt also diagnosed the problem and 
>> gave an initial fix (which the changes in the webrev are based upon).
>> Summary:
>> During the remark pause, the execution of the parallel RemarkTask set 
>> the number of workers thread in the ParallelTaskTerminator and the 
>> first and second barrier syncs. During serial reference processing, 
>> the marking stack overflowed causing the single (VMThread) thread to 
>> enter the overflow handling code in CMTask::do_marking_step(). This 
>> overflow code using two WorkBarrierSyncs to synchronize the threads 
>> before resetting the marking state for restarting marking. The 
>> barrier syncs were waiting for the number of threads that 
>> participated in the RemarkTask) but, since only the VM thread was 
>> executing, only a single thread entered the barrier - resulting in 
>> the barrier indefinitely waiting for the other (non existent) threads.
>> A proposed solution was to call set_phase to reset the number of 
>> threads in the parallel task terminator and the barriers to the 
>> number of active threads for the reference processing. This solution 
>> ran into a similar hang while processing the JNI references with 
>> parallel reference processing enabled. (In parallel reference 
>> processing, the JNI references are processed serially by the calling 
>> thread). Resetting the phase to single-threaded before processing the 
>> JNI refs solved the second hang but resulted in an assertion failure: 
>> only a concurrentGC thread can enter a barrier sync and the calling 
>> thread was the VM thread.
>> Furthermore another problem was discovered. If the marking state is 
>> reset, a subsequent call to set_phase() will assert as the global 
>> finger has been set to start of the heap. This was a discovered by 
>> the marking stack overflowing during the RemarkTask and parallel 
>> reference processing calling set_phase() to reinitialize number of 
>> workers in the parallel task terminator. It was also discovered when 
>> trying out another proposed solution: adding a start_gc closure to 
>> reference processing  which would call set_phase() before each 
>> processing phase. As a result the marking state is only reset by 
>> worker 0 if an overflow occurs during the concurrent phase of 
>> marking; if an overflow occurs during remark, reference processing is 
>> skipped, and the marking state is reset by the VM thread. Resetting 
>> the marking state before reference processing was a benign error 
>> (objects would be marked but not pushed on to the stack as they were 
>> no longer below the finger; the objects would then be traced, in the 
>> normal fashion, when marking restarted) but it's better to safe than 
>> sorry. The other part of the fix for this secondary problem is that 
>> the parallel reference processing task executor now calls the 
>> terminator's reset_for_reuse() routine instead of set_phase().
>> The resulting solution for the hang is based upon the patch sent out 
>> by Bengt - namely we do not enter the sync barriers when 
>> CMTask::do_marking_step() is being called serially. The difference is 
>> that I added an extra parameter to CMTask::do_marking_step() instead 
>> of piggy-backing on the existing parameter list. Additionally, if 
>> this new parameter indicates serial operation, the current thread 
>> will skip offering termination. This allows the serial drain closure 
>> to enter the termination protocol and execute the guarantees 
>> contained therein.
>> The other changes are for the secondary issues, described above, that 
>> were discovered while out trying other possible solutions.
>> Testing:
>> The lucene test case with serial reference processing (with and 
>> without verification);  the lucene test case with parallel reference 
>> processing (with and without verification).
>> GC test suite with a mark stack size of 1K and 4K, with both serial 
>> and parallel reference processing (with and without verification).

More information about the hotspot-gc-dev mailing list