[9] RFR(S): 8029443: 'assert(klass->is_loader_alive(_is_alive)) failed: must be alive' during VM_CollectForMetadataAllocation

Tobias Hartmann tobias.hartmann at oracle.com
Tue Jul 22 09:22:27 UTC 2014

On 21.07.2014 20:59, Vladimir Kozlov wrote:
> On 7/21/14 1:44 AM, Tobias Hartmann wrote:
>> Vladimir, Coleen, thanks for the reviews!
>> On 18.07.2014 20:09, Vladimir Kozlov wrote:
>>> On 7/18/14 11:02 AM, Coleen Phillimore wrote:
>>>> On 7/18/14, 11:06 AM, Vladimir Kozlov wrote:
>>>>> On 7/18/14 4:38 AM, Tobias Hartmann wrote:
>>>>>> Hi,
>>>>>> I spend some more days and was finally able to implement a test that
>>>>>> deterministically triggers the bug:
>>>>> Why do you need to switch off compressed oops? Do you need to switch
>>>>> off compressed klass pointers too (UseCompressedClassPointers)?
>>>> CompressedOops when off turns off CompressedClassPointers.
>>> You are right, I forgot that. Still the question is why switch off 
>>> coop?
>> I'm only able to reproduce the bug without compressed oops. The original
>> bug also only reproduces with -XX:-UseCompressedOops. I tried to figure
>> out why (on Sparc):
>> With compressed oops enabled, Method* metadata referencing 'WorkerClass'
>> is added to 'doWork' in MacroAssembler::set_narrow_klass(..). In
>> CodeBuffer::finalize_oop_references(..) the metadata is processed and an
>> oop to the class loader 'URLClassLoader' is added. This oop leads to the
>> unloading of 'doWork', hence the verification code is never executed.
>> I'm not sure what set_narrow_klass(..) is used for in this case. I
>> assume it stores a 'WorkerClass' Klass* in a register as part of an
>> optimization? Because 'doWork' potentially works on any class.
>> Apparently this optimization is not performed without compressed oops.
> I would suggest to compare 'doWork' assembler 
> (-XX:CompileCommand=print,TestMethodUnloading::doWork) with coop and 
> without it. Usually loaded into register class is used for klass 
> compare do guard inlining code. Or to initialize new object.
> I don't see loading (constructing) uncompressed (whole) klass pointer 
> from constant in sparc.ad. It could be the reason for different 
> behavior. It could be loaded from constants section. But constants 
> section should have metadata relocation info in such case.

I did as you suggested and found the following:

During the profiling phase the class given to 'doWork' always is 
'WorkerClass'. The C2 compiler therefore optimizes the compiled version 
to expect a 'WorkerClass'. The branch that instantiates a new object is 
guarded by an uncommon trap (class_check). The difference between the 
two versions (with and without compressed oops) is the loading of the 
'WorkerClass' Klass to check if the given class is equal:

With compressed oops:
    SET    narrowklass: precise klass WorkerClass: 
0x00000001004a0d40:Constant:exact *,R_L1 ! compressed klass ptr
    CWBne  R_L2,R_L1,B8      ! compressed ptr  P=0.000001 C=-1.000000

    SET    precise klass WorkerClass: 0x00000001004aeab0:Constant:exact 
*,R_L1      ! non-oop ptr
    CXBpne R_L2,R_L1,B8     ! ptr  P=0.000001 C=-1.000000

R_L2: class given as parameter
B8: location of uncommon trap

In the first case, the Klass is loaded by a 'loadConNKlass' instruction 
that calls MacroAssembler::set_narrow_klass(..) which then creates a 
metadata_Relocation for the 'WorkerClass'. This metada_Relocation is 
processed by CodeBuffer::finalize_oop_references(..) and an oop to 
'WorkerClass' is added. This oop causes the unloading of the method.

In the second case, the Klass is loaded by a 'loadConP_no_oop_cheap' 
instruction that does not create a metadata_Relocation.

I don't understand why the metadata_Relocation in the first case is 
needed? As the test shows it is better to only unload the method if we 
hit the uncommon trap because we could still use other (potentially 
complex) branches of the method.


> thanks,
> Vladimir
>> Best,
>> Tobias
>>> Vladimir
>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.02/
>>>>> Very nice!
>>>> Yes, I agree.  Impressive.
>>>> The refactoring in nmethod.cpp looks good to me.  I have no further
>>>> comments.
>>>> Thanks!
>>>> Coleen
>>>>>> @Vladimir: The test shows why we should only clean the ICs but not
>>>>>> unload the nmethod if possible. The method ' doWork'
>>>>>> is still valid after WorkerClass was unloaded and depending on the
>>>>>> complexity of the method we should avoid unloading it.
>>>>> Make sense.
>>>>>> On Sparc my patch fixes the bug and leads to the nmethod not being
>>>>>> unloaded. The compiled version is therefore used even
>>>>>> after WorkerClass is unloaded.
>>>>>> On x86 the nmethod is unloaded anyway because of a dead oop. This is
>>>>>> probably due to a slightly different implementation
>>>>>> of the ICs. I'll have a closer look to see if we can improve that.
>>>>> Thanks,
>>>>> Vladimir
>>>>>> Thanks,
>>>>>> Tobias
>>>>>> On 16.07.2014 10:36, Tobias Hartmann wrote:
>>>>>>> Sorry, forgot to answer this question:
>>>>>>>> Were you able to create a small test case for it that would be
>>>>>>>> useful to add?
>>>>>>> Unfortunately I was not able to create a test. The bug only
>>>>>>> reproduces on a particular system with a > 30 minute run
>>>>>>> of runThese.
>>>>>>> Best,
>>>>>>> Tobias
>>>>>>> On 16.07.2014 09:54, Tobias Hartmann wrote:
>>>>>>>> Hi Coleen,
>>>>>>>> thanks for the review.
>>>>>>>>> *+           if (csc->is_call_to_interpreted() &&
>>>>>>>>> stub_contains_dead_metadata(is_alive, csc->destination())) {*
>>>>>>>>> *+             csc->set_to_clean();*
>>>>>>>>> *+           }*
>>>>>>>>> This appears in each case.  Can you fold it and the new function
>>>>>>>>> into a function like
>>>>>>>>> clean_call_to_interpreted_stub(is_alive, csc)?
>>>>>>>> I folded it into the function clean_call_to_interpreter_stub(..).
>>>>>>>> New webrev: 
>>>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.01/
>>>>>>>> Thanks,
>>>>>>>> Tobias
>>>>>>>>> Thanks,
>>>>>>>>> Coleen
>>>>>>>>>> So before the permgen removal embedded method* were oops and 
>>>>>>>>>> they
>>>>>>>>>> were processed in relocInfo::oop_type loop.
>>>>>>>>>> May be instead of specializing opt_virtual_call_type and
>>>>>>>>>> static_call_type call site you can simple add a loop for
>>>>>>>>>> relocInfo::metadata_type (similar to oop_type loop)?
>>>>>>>>>> Thanks,
>>>>>>>>>> Vladimir
>>>>>>>>>> On 7/14/14 4:56 AM, Tobias Hartmann wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>> please review the following patch for JDK-8029443.
>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8029443
>>>>>>>>>>> Webrev: 
>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.00/
>>>>>>>>>>> *Problem*
>>>>>>>>>>> After the tracing/marking phase of GC, 
>>>>>>>>>>> nmethod::do_unloading(..)
>>>>>>>>>>> checks
>>>>>>>>>>> if a nmethod can be unloaded because it contains dead oops. If
>>>>>>>>>>> class
>>>>>>>>>>> unloading occurred we additionally clear all ICs where the 
>>>>>>>>>>> cached
>>>>>>>>>>> metadata refers to an unloaded klass or method. If the nmethod
>>>>>>>>>>> is not
>>>>>>>>>>> unloaded, nmethod::verify_metadata_loaders(..) finally 
>>>>>>>>>>> checks if
>>>>>>>>>>> all
>>>>>>>>>>> metadata is alive. The assert in CheckClass::check_class fails
>>>>>>>>>>> because
>>>>>>>>>>> the nmethod contains Method* metadata corresponding to a dead
>>>>>>>>>>> Klass.
>>>>>>>>>>> The Method* belongs to a to-interpreter stub [1] of an 
>>>>>>>>>>> optimized
>>>>>>>>>>> compiled IC. Normally we clear those stubs prior to
>>>>>>>>>>> verification to
>>>>>>>>>>> avoid dangling references to Method* [2], but only if the stub
>>>>>>>>>>> is not in
>>>>>>>>>>> use, i.e. if the IC is not in to-interpreted mode. In this
>>>>>>>>>>> case the
>>>>>>>>>>> to-interpreter stub may be executed and hand a stale Method*
>>>>>>>>>>> to the
>>>>>>>>>>> interpreter.
>>>>>>>>>>> *Solution
>>>>>>>>>>> *The implementation of nmethod::do_unloading(..) is changed to
>>>>>>>>>>> clean
>>>>>>>>>>> compiled ICs and compiled static calls if they call into a
>>>>>>>>>>> to-interpreter stub that references dead Method* metadata.
>>>>>>>>>>> The patch was affected by the G1 class unloading changes
>>>>>>>>>>> (JDK-8048248)
>>>>>>>>>>> because the method nmethod::do_unloading_parallel(..) was
>>>>>>>>>>> added. I
>>>>>>>>>>> adapted the implementation as well.
>>>>>>>>>>> *
>>>>>>>>>>> Testing
>>>>>>>>>>> *Failing test (runThese)
>>>>>>>>>>> JPRT
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Tobias
>>>>>>>>>>> [1] see CompiledStaticCall::emit_to_interp_stub(..)
>>>>>>>>>>> [2] see nmethod::verify_metadata_loaders(..),
>>>>>>>>>>> static_stub_reloc()->clear_inline_cache() clears the stub

More information about the hotspot-dev mailing list