G1GC/ JIT compilation bug hunt.
vladimir.kozlov at oracle.com
Fri Aug 16 00:50:25 PDT 2013
On 8/15/13 11:45 PM, Dawid Weiss wrote:
>> It is with high probability Compiler problem.
> I believe so. I've re-run the tests with 1.8b102 and the problem is
> still there, although it's more difficult to show -- I ran a 100 full
> builds yesterday, five of them tripped on assertions that should be
We switched on -XX:+TieredCompilation by default in b102. Switch it off to use only C2 compiler which has the problem.
>> G1 has larger write-barrier code then other GCs. It can affect inlining
>> decisions. You can try to change -XX:InlineSmallCode=1000 value. It controls
>> inlining of methods which were already compiled.
>> You can also try -Xbatch -XX:CICompilerCount=1 to get serial compilations.
> Thanks for these tips, Vladimir -- very helpful. I hope you don't mind
> me asking one more question - we had a discussion with another Lucene
> developer yesterday -- is -Xbatch deterministic in the sense that if
> you run a single thread/ deterministic piece of code it will always
> trigger compiles at the same time? What happens if there are two
> uncoordinated threads that hit a set of the same methods (and thus
> when the compiler kicks in the statistics will probably be different
> for each independent run)?
-Xbatch (equivalent to -XX:-BackgroubdCompilation) will block only thread which first put compilation task on compile
queue. Other threads check that the task in the queue and resume execution without waiting.
You still can't get full determinism with several java threads, as you notice. But it can reduce some variations in
inlining decision because compilation will be executed by one Compiler thread (instead of 2 by default). So if
compilation tasks are put on queue at the same order in different runs you most likely will get the same code
generation. Of cause usually the order is slightly different (especially during startup when there are a lot of
compilation requests) so you can still get different results.
> This question originated from a broader discussion where we were
> wondering how you, the compiler-guru guys approach the debugging in
> case something like this pops up -- a bug that is very hard to
> reproduce, that manifests itself rarely and for which pretty much any
> change at the Java level changes the compilation and thus generates
> completely different code. This seems to be a tough nut to crack.
We usually try to reproduce the problem with debug version of VM which have a lot asserts and we may hit one which helps
identify the problem. You are lucky if you can reproduce a problem in debug VM in debugger.
We try to get assembler output of compiled method during run when it crushes. hs_err file has address and offset in
compiled code and small code snippet which helps to find the code. After that we "look hard" on assembler code and try
to figure out what is wrong with it and which compiler part can generate such code pattern.
There is debug flag -XX:AbortVMOnException==java.lang.NullPointerException which allow to abort VM on exceptions. And
with -XX:+ShowMessageBoxOnError flag we allow to attach debugger to VM when it happened.
When we get only core file it is tough. We try to use Serviceability Agent to extract information and compiled code from
it and other data.
An other suggestion for you. Since you can avoid problem with switched off EA you can try to switch off only
-XX:-OptimizePtrCompare "Use escape analysis to optimize pointers compare"
-XX:-EliminateAutoBox "Control optimizations for autobox elimination"
More information about the hotspot-dev