RFR: 8213486: SIGSEGV in CompiledMethod::cleanup_inline_caches_impl with AOT

Erik Österlund erik.osterlund at oracle.com
Wed Nov 21 14:26:04 UTC 2018

Hi Vladimir,

Thank you for the explanation.

I inserted some debugging code to be able to draw better conclusions.
I put in a bool that tags CompiledMethods that are visited during code 
cache unloading.

When I crash, I find the AOTCompiledMethod in the following state:

* The CodeCache::_unloading_cycle is 2, meaning there has been at least 
one unloading that occurred since the VM started.
* My new boolean saying if it was found during code cache walking said 
false; it has never been observed by GC code cache unloading.
* The _is_unloading_state of the AOT method is 2, meaning it was from 
CodeCache::_unloading_cycle number 1. So there has been at least one 
code cache unloading cycle since the AOT method was created.
* It is in the CompiledMethod state in_use.
* The corresponding CodeToAMethod entry has _state equal to in_use, and 
the _aot pointer points right back to the AOT method as expected.

So the conclusions I can draw from this is that we are looking at an 
AOTCompiledMethod that is in_use, alive, and published, but was created 
at least one unloading cycle earlier, yet was *not* found by the evident 
code cache unloading that has occurred between creating the AOT method 
and the crash. In fact it has not been found in any code cache unloading 
event at all.

So I am certain about the problem being that the AOT method is not found 
during our code cache walk, and that the solution to that problem is to 
move the epoch business to nmethod instead. However, it would be 
desirable to know exactly why the AOT method isn't showing up in the 
code cache walk, but I'm afraid I can't see right now exactly why it has 
been missed out, only that it definitely has been missed out. :/


On 2018-11-20 19:08, Vladimir Kozlov wrote:
> To clarify.
> 'invalid' state is set only for AOT methods which never was registers 
> (their state is 'not_set' and corresponding class was not initialized 
> yet). Such methods are never visible to iterators and IC caches.
> If aot method is registered its state become 'in_use' and in such case 
> it is treated and processed as normal nmethod. And AOTCompiledMethod 
> structure is created only for such case.
> Vladimir
> On 11/20/18 3:12 AM, Erik Österlund wrote:
>> Hi Dean,
>> Yeah I think I misunderstood what I observed. So what I know is that 
>> in the code cache unloading, I'm not getting all is_alive() AOT 
>> methods into my iterator, which messes up the assumptions made by the 
>> epoch based scheme for AOT methods.
>> I noticed that in AOTCodeHeap::sweep_dependent_methods(int* indexes, 
>> int methods_cnt) we make AOTCompiledMethods "invalid" in the AOT 
>> heap, making it no longer observable from the iterators. Then it 
>> calls the VM_Deoptimize vm operation after. Throughout all this, the 
>> AOTCompiledMethod is alive(), yet when the iterators ask for all 
>> is_alive() AOTCompiledMethods, it won't be visible. But I suppose IC 
>> caches may still reference these methods and check if it 
>> is_unloading, and then we blow up. There may possibly be multiple 
>> ways for is_alive() AOTCompiledMethods to not be visible from 
>> iterators yet be visible through IC caches using the "invalid" state 
>> in the .
>> Anyway, the fix is the same: stop doing the epoch state thingey for 
>> is_unloading() on AOTCompiledMethod where it isn't needed, and 
>> doesn't seem to play well with the rather different life cycle it 
>> has, and just return false instead.
>> Thanks,
>> /Erik
>> On 2018-11-20 00:00, dean.long at oracle.com wrote:
>>> Hi Erik,
>>> On 11/19/18 12:42 PM, Erik Österlund wrote:
>>>> ...except it looks like for AOTCompiledMethods when running with 
>>>> tiered compilation, may first be is_alive(), then become 
>>>> !is_alive() for a while, and then get resurrected to is_alive() 
>>>> using make_entrant().
>>> this doesn't sounds quite right.  AOTCompiledMethods aren't allowed 
>>> to transition to zombie (!alive), only not_used or not_entrant, 
>>> which should still have is_alive() returning true. Maybe some code 
>>> is using is_not_entrant() instead of !is_alive()?
>>> dl

More information about the hotspot-dev mailing list