SIGSEGV in C2 compiler
tom.rodriguez at oracle.com
Wed Feb 2 18:49:17 PST 2011
I'm sorry I didn't follow up on this with you, I lost track of it over the Christmas holiday. This appears to be the same issue as 6675699 where a ConvI2L with a constrained type on it has it's input replaced with a constant outside that range. The identity transformation converts that to top which causes a phi collapse and the control flow to be killed. We've kind of band aided this in a few other places but it appears that loop unswitching can create similar issues. In your original program it was caused by unswitching on a test which would never actually fail which creates an alternate that if it were to execute it would throw an exception. In your new example it's this part of the output:
Pop 147 ConvI2L === _ 20 [[ 153 ]] Type:long:1..maxint:www !jvms: Test::main @ bci:32
< long:1..maxint:www < 147 ConvI2L === _ _ [] 
Node 20 is the constant 0 which is outside the range of 1..maxint.
> int:0 20 ConI === 0 [[ 399 104 290 147 369 211 211 188 ]] Type:int:0
I think we're unswitching the (j != 0) test which results in a version where j == 0 when we exit the loop which would result in a exception.
In your original program, I think if you converted your switches into explicit tests for the possible values the unswitching wouldn't cause any problems. I think you could also make the default case throw an exception to terminate that path. Either of those should also result in a more useful unswitching of your loop as well.
As far as bug 6675699 itself, I'd hoped we stamped out the obvious gotchas since there isn't a really nice fix for it. We can stop the problem from occurring by putting control on the ConvI2L but that interferes with a lot of other optimizations that take advantage of the improved type information and we have to work out a solution to that. Anyway, we'll do something about this for 7 and I'll put your reduced test into the bug report. Thanks for distilling it down. Ping me directly if you'd like help restructuring the loop to avoid the problem.
On Feb 2, 2011, at 8:05 AM, Denis Lila wrote:
> Hi Tom.
> I tried to simplify the reproducer for this. I managed to turn
> it into a 20 line file that I've attached. However it must be run
> with -XX:CompileOnly=Test.main. -Xbatch and
> -XX:OnStackReplacePercentage=60 are no longer needed.
> The graphs were generated using the command
> ~/src/jdk7/build/linux-amd64-debug/bin/java -Xbatch -XX:-DoEscapeAnalysis -XX:-SubsumeLoads -XX:-UseLoopPredicate -XX:-PartialPeelLoop -XX:-PartialPeelAtUnsignedTests -XX:+LoopUnswitching -XX:+VerifyGraphEdges -XX:+VerifyIterativeGVN -XX:+TraceIterativeGVN -XX:+TraceOptoParse -XX:+TraceLoopPredicate -XX:+TracePartialPeeling -XX:+TraceLoopUnswitching -XX:+PrintCompilation -XX:PrintIdealGraphFile=./ecjGraph.xml -XX:PrintIdealGraphLevel=3 -XX:+PrintIdeal -XX:+PrintOpto -XX:+Verbose -XX:CompileOnly=Test.h Test
> I tried to turn off as many optimizations as possible.
> With the above command opto/compile.cpp:1673,1689 end up
> executing (they are PhaseIdealLoop constructors). All the xml files
> are from the execution of the first PhaseIdealLoop constructor. By
> its second call the graph seems to already be broken because in
> in build_loop_late, build_loop_late_post is called on node 296. The
> control for 296 is determined to be 314. This seems correct, and it
> is the variable Node *early. LCA however, is computed as 39. 39 dominates
> 314 so in the while( early != legal ) we end up bubbling legal up to the
> root, then we call idom(root), and that causes a segfault because the root
> isn't dominated by anything.
> Now, there doesn't seem to be anything wrong with the dominator computations.
> The problem seems to be that in the _igvn.optimize() call at the end of the
> first PhaseIdealLoop constructor, node 357 is replaced by its parent 296 because
> 357 is a phi node and one of its two inputs (node 153) becomes dead. So 296's
> children become 259 and 295. When compute_lca_of_uses is called the control of
> 296 is found to be 303 (this is correct). Then, in the next iteration, we get
> the control of 259 which is 227. So we call dom_lca_for_get_late_ctrl(303, 227, 296)
> which correctly returns 39.
> Now, if 357 hadn't been replaced by 296, in that second iteration the
> if( c->is_Phi() ) path would have been executed, so "use" would have been
> computed to be 357->in(0)->in(j) == 349, instead of 227. Then
> dom_lca_for_get_late_ctrl(303, 349, 296) would have been called, which would
> have returned either 333 or 341 (because 333 may be a split ctrl) both of
> which are dominated by 314, so no crash would result.
> So, the problem seems to be either that the phi's input is killed or that that
> input's corresponding control is not dead. This happens in _igvn.optimize, but
> I can't see any errors there, so I'm thinking the real problem is in the loop
> iteration_split call that precedes it. I haven't found the exact problem yet, but
> I'm working on it.
> Anyway, I hope this helps (or at least makes sense).
> PS: just to clarify, if the program is run with -XX:-LoopUnswitching, there is no
> crash, which supports my beliefs from above.
> ----- Original Message -----
>> I was able to reproduce your crash from the class files. I filed
>> 7004570 for it. Running java -d64 -cp pisces.jar -XX:+PrintCompilation
>> -Xbatch -XX:OnStackReplacePercentage=60 pisces.Test reproduces it
>> reliably for me on Solaris. I'm looking into it now.
>> On Nov 22, 2010, at 12:36 PM, Denis Lila wrote:
>>>> What about the latest hotspot which is hs20-b02?
>>> That also crashes.
>>>> I think we recently fixed a bad graph bug related to EA. You can
>>>> -XX:-DoEscapeAnalysis. Actually if it reproduces with hs17 then
>>>> probably not the same EA bug.
>>> Yes, it also crashes with -XX:-DoEscapeAnalysis.
>>>> Can you provide the ecj compiled class files too?
>>> ----- "Tom Rodriguez" <tom.rodriguez at oracle.com> wrote:
>>>> On Nov 22, 2010, at 11:27 AM, Denis Lila wrote:
>>>>> I'm sorry, I accidentally sent this without finishing it.
>>>>> I meant to say that gdbdump.txt, hotspot.log, and the hs_err_*.log
>>>>> files were obtained using a fastdebug build of hotspot 19-b06.
>>>>> I can reproduce the crash with hotspot 17 too.
>>>>>> I did not submit a bug report at sun.bugs.com because I couldn't
>>>>>> a way
>>>>>> to attach the 4 files.
>>>> Yes that's kind of lame. You can just include a note in the
>>>> description that says to contact you directly for the files.
>>>> including the hs_err as text is a good idea though.
More information about the hotspot-compiler-dev