RFR(S): 8009536: G1: Apache Lucene hang during reference processing
john.cuthbertson at oracle.com
Mon Mar 11 14:35:58 PDT 2013
Can I have a couple of volunteers review these changes? The webrev can
be found at: http://cr.openjdk.java.net/~johnc/8009536/webrev.0/.
First of all - many thanks to Uwe Schindler for discovering an reporting
the problem and providing very clear instructions on how to reproduce
the issue. Many thanks also Dawid Weiss for also stepping in with a
I also wish to thank Bengt for his help. It was Bengt who gave me the
magic proxy formula that allowed Ivy to satisfy and download all the
dependencies for the test case. Bengt also diagnosed the problem and
gave an initial fix (which the changes in the webrev are based upon).
During the remark pause, the execution of the parallel RemarkTask set
the number of workers thread in the ParallelTaskTerminator and the first
and second barrier syncs. During serial reference processing, the
marking stack overflowed causing the single (VMThread) thread to enter
the overflow handling code in CMTask::do_marking_step(). This overflow
code using two WorkBarrierSyncs to synchronize the threads before
resetting the marking state for restarting marking. The barrier syncs
were waiting for the number of threads that participated in the
RemarkTask) but, since only the VM thread was executing, only a single
thread entered the barrier - resulting in the barrier indefinitely
waiting for the other (non existent) threads.
A proposed solution was to call set_phase to reset the number of threads
in the parallel task terminator and the barriers to the number of active
threads for the reference processing. This solution ran into a similar
hang while processing the JNI references with parallel reference
processing enabled. (In parallel reference processing, the JNI
references are processed serially by the calling thread). Resetting the
phase to single-threaded before processing the JNI refs solved the
second hang but resulted in an assertion failure: only a concurrentGC
thread can enter a barrier sync and the calling thread was the VM thread.
Furthermore another problem was discovered. If the marking state is
reset, a subsequent call to set_phase() will assert as the global finger
has been set to start of the heap. This was a discovered by the marking
stack overflowing during the RemarkTask and parallel reference
processing calling set_phase() to reinitialize number of workers in the
parallel task terminator. It was also discovered when trying out another
proposed solution: adding a start_gc closure to reference processing
which would call set_phase() before each processing phase. As a result
the marking state is only reset by worker 0 if an overflow occurs during
the concurrent phase of marking; if an overflow occurs during remark,
reference processing is skipped, and the marking state is reset by the
VM thread. Resetting the marking state before reference processing was a
benign error (objects would be marked but not pushed on to the stack as
they were no longer below the finger; the objects would then be traced,
in the normal fashion, when marking restarted) but it's better to safe
than sorry. The other part of the fix for this secondary problem is that
the parallel reference processing task executor now calls the
terminator's reset_for_reuse() routine instead of set_phase().
The resulting solution for the hang is based upon the patch sent out by
Bengt - namely we do not enter the sync barriers when
CMTask::do_marking_step() is being called serially. The difference is
that I added an extra parameter to CMTask::do_marking_step() instead of
piggy-backing on the existing parameter list. Additionally, if this new
parameter indicates serial operation, the current thread will skip
offering termination. This allows the serial drain closure to enter the
termination protocol and execute the guarantees contained therein.
The other changes are for the secondary issues, described above, that
were discovered while out trying other possible solutions.
The lucene test case with serial reference processing (with and without
verification); the lucene test case with parallel reference processing
(with and without verification).
GC test suite with a mark stack size of 1K and 4K, with both serial and
parallel reference processing (with and without verification).
More information about the hotspot-gc-dev