RFR: 8213486: SIGSEGV in CompiledMethod::cleanup_inline_caches_impl with AOT
dean.long at oracle.com
dean.long at oracle.com
Wed Nov 21 16:32:49 UTC 2018
I think the problem is CompiledMethodIterator vs NMethodIterator.
CompiledMethodIterator will visit AOT methods, but NMethodIterator will
not. As long as AOT methods can't be unloaded, it makes sense to
continue to use NMethodIterator and move unloading logic from
CompiledMethod to nmethod.
On 11/21/18 6:26 AM, Erik Österlund wrote:
> Hi Vladimir,
> Thank you for the explanation.
> I inserted some debugging code to be able to draw better conclusions.
> I put in a bool that tags CompiledMethods that are visited during code
> cache unloading.
> When I crash, I find the AOTCompiledMethod in the following state:
> * The CodeCache::_unloading_cycle is 2, meaning there has been at
> least one unloading that occurred since the VM started.
> * My new boolean saying if it was found during code cache walking said
> false; it has never been observed by GC code cache unloading.
> * The _is_unloading_state of the AOT method is 2, meaning it was from
> CodeCache::_unloading_cycle number 1. So there has been at least one
> code cache unloading cycle since the AOT method was created.
> * It is in the CompiledMethod state in_use.
> * The corresponding CodeToAMethod entry has _state equal to in_use,
> and the _aot pointer points right back to the AOT method as expected.
> So the conclusions I can draw from this is that we are looking at an
> AOTCompiledMethod that is in_use, alive, and published, but was
> created at least one unloading cycle earlier, yet was *not* found by
> the evident code cache unloading that has occurred between creating
> the AOT method and the crash. In fact it has not been found in any
> code cache unloading event at all.
> So I am certain about the problem being that the AOT method is not
> found during our code cache walk, and that the solution to that
> problem is to move the epoch business to nmethod instead. However, it
> would be desirable to know exactly why the AOT method isn't showing up
> in the code cache walk, but I'm afraid I can't see right now exactly
> why it has been missed out, only that it definitely has been missed
> out. :/
> On 2018-11-20 19:08, Vladimir Kozlov wrote:
>> To clarify.
>> 'invalid' state is set only for AOT methods which never was registers
>> (their state is 'not_set' and corresponding class was not initialized
>> yet). Such methods are never visible to iterators and IC caches.
>> If aot method is registered its state become 'in_use' and in such
>> case it is treated and processed as normal nmethod. And
>> AOTCompiledMethod structure is created only for such case.
>> On 11/20/18 3:12 AM, Erik Österlund wrote:
>>> Hi Dean,
>>> Yeah I think I misunderstood what I observed. So what I know is that
>>> in the code cache unloading, I'm not getting all is_alive() AOT
>>> methods into my iterator, which messes up the assumptions made by
>>> the epoch based scheme for AOT methods.
>>> I noticed that in AOTCodeHeap::sweep_dependent_methods(int* indexes,
>>> int methods_cnt) we make AOTCompiledMethods "invalid" in the AOT
>>> heap, making it no longer observable from the iterators. Then it
>>> calls the VM_Deoptimize vm operation after. Throughout all this, the
>>> AOTCompiledMethod is alive(), yet when the iterators ask for all
>>> is_alive() AOTCompiledMethods, it won't be visible. But I suppose IC
>>> caches may still reference these methods and check if it
>>> is_unloading, and then we blow up. There may possibly be multiple
>>> ways for is_alive() AOTCompiledMethods to not be visible from
>>> iterators yet be visible through IC caches using the "invalid" state
>>> in the .
>>> Anyway, the fix is the same: stop doing the epoch state thingey for
>>> is_unloading() on AOTCompiledMethod where it isn't needed, and
>>> doesn't seem to play well with the rather different life cycle it
>>> has, and just return false instead.
>>> On 2018-11-20 00:00, dean.long at oracle.com wrote:
>>>> Hi Erik,
>>>> On 11/19/18 12:42 PM, Erik Österlund wrote:
>>>>> ...except it looks like for AOTCompiledMethods when running with
>>>>> tiered compilation, may first be is_alive(), then become
>>>>> !is_alive() for a while, and then get resurrected to is_alive()
>>>>> using make_entrant().
>>>> this doesn't sounds quite right. AOTCompiledMethods aren't allowed
>>>> to transition to zombie (!alive), only not_used or not_entrant,
>>>> which should still have is_alive() returning true. Maybe some code
>>>> is using is_not_entrant() instead of !is_alive()?
More information about the hotspot-dev