RFR(L): 8226705: [REDO] Deoptimize with handshakes

Daniel D. Daugherty daniel.daugherty at oracle.com
Tue Sep 10 22:19:54 UTC 2019

On 9/9/19 8:45 AM, Robbin Ehn wrote:
> Hi all,
> Me and Erik had some offline discussions, which ended up in this 
> incremental change:
> http://cr.openjdk.java.net/~rehn/8226705/v2/inc/webrev/
> Full:
> http://cr.openjdk.java.net/~rehn/8226705/v2/full/webrev/

Thanks for coming back to this change! I know it has been a
difficult one...

Going with the full webrev since it has been so long since I looked
at the original changes:

     No comments.

     No comments.

     Hmmm... No changes to this file.

     No comments.

     No comments.

     No comments.

     No comments.

     No comments.

     No comments.

     No comments.

     No comments.

     No comments.

     No comments.

     No comments.

     No comments.

     No comments.

     No comments.

     No comments.

     No comments.

     L729: void BiasedLocking::revoke_own_locks(Handle obj, TRAPS) {
         Perhaps add:
           assert(THREAD->is_Java_thread(), "must be called by a 
           JavaThread* thread = (JavaThread*)THREAD;

         and then switch uses of 'THREAD' for 'thread'. This way we
         are not casting away a bad caller...

     L193:   // This should be called by a JavaThread to revoke the bias 
of an owned object
             // This must only be called by a JavaThread to revoke the 
bias of an owned object.

     Moving the top part of Deoptimization::fetch_unroll_info_helper()
     made this hard to review, but I suspect leaving it in place would
     have been just as hard because of the refactor of code into
     eliminate_allocations() and eliminate_locks(). Did lots of
     scrolling of the frames windows and I think it looks correct...

     L152:   static void deoptimize(JavaThread* thread, frame fr, 
RegisterMap *reg_map, DeoptReason reason = Reason_constraint);
         I'm not fond of default parameters, but I'll live... :-)

     old L67:        special        = tty            +   1,
     new L67:        special        = tty            +   2,
         I'll be _so glad_ when the lock ranking stuff is cleaned up!

     No (non-whiny) comments. :-)

     L236:   def(CompiledMethod_lock          , PaddedMutex  , 
special-1,   true,  Monitor::_safepoint_check_never);
         So CompiledMethod_lock is 'tty' + 1. Is that so that it ranks
         lower than the Patching_lock? Any lock ranking issues expected
         for the places you replaced Patching_lock with CompiledMethod_lock?

         I don't think you replaced all uses of Patching_lock, right?
         (Sorry if I asked that before...)

     old L251:   def(OsrList_lock                 , PaddedMutex  , 
leaf,        true,  Monitor::_safepoint_check_never);
         So OsrList_lock was a 'leaf' and you've replaced its uses with
         CompiledMethod_lock which is (special - 1). I'm assuming that
         no lock ranking hiccups have been seen...

     L35: extern Mutex*   CompiledMethod_lock;             // a lock 
used to guard a compiled method
         Should you also mention the OSR queues since CompiledMethod_lock
         also replaced this:

         old L93: extern Mutex*   OsrList_lock; // a lock used to 
serialize access to OSR queues

     No comments.

     No comments.

     No comments.

     No comments.

     No comments.

     No comments (another VM_Op bites the dust...)

     No comments other than: Such a strange place to find the
     VM_DeoptimizeTheWorld VM_Op...

     Shouldn't there be an @bug 8226705 with the new test?

     What tier does the new test execute in and have you verified
     that it passes on all Oracle platforms?

You'll definitely want a review from a Compiler team member
for this one. :-) Since there are Biased Locking changes,
Patricio should also chime in here.

For testing, I'm assuming that the tests that failed due to

     JDK-8221734 Deoptimize with handshakes

have been run on this REDO and that they pass.

Thumbs up! I don't need to see another webrev if you decide to
make any changes due to my few comments above.


> The problem is a nmethods state must never go backwards.
> When the methods code is set to the compiled code it can be made 
> not_entrant.
> (it can be set to not_entrant before not having Compiled_lock)
> When unlinking the methods code must point to the nmethod.
> If we set code before changing state, it can go backwards.
> If we set state before setting code, we can't unlink it.
> This solution uses the new CompiledMethod_lock to synchronize code and 
> state changes. (unloaded don't need the lock since it require a 
> safepoint to happen)
> But this resulted in a circular lock issue with OsrList_lock.
> Therefore OsrList_lock is removed, instead we use CompiledMethod_lock.
> After this patch it should be possible to remove Compile_lock from 
> some paths. (Note that JVMCI is missing Compile_lock from several 
> pathes, but fewer with this changeset)
> Also InterpreterRuntime::frequency_counter_overflow_inner seem to have 
> the same bug as the reason for this redo: 
> BiasedLocking::revoke(objects_to_revoke, thread);
> The objects could have been rebaised by the time we return from it.
> I think we should call the new 
> BiasedLocking::revoke_own_locks(objects_to_revoke->at(i), thread); 
> here also.
> Passes t1-5 twice and KS 60 min.
> @Erik I stayed with current is_in_use (not_installed + in_use), there 
> were to many places and many of them needed an equivalent method if 
> is_in_use was to be changed (to only in_use). I didn't see any issue 
> with using the current definition.
> Thanks, Robbin
> On 8/29/19 4:35 PM, Erik Österlund wrote:
>> Hi Robbin,
>> On 2019-08-29 14:25, Robbin Ehn wrote:
>>> Hi Erik, thanks for having a look.
>>> Just some short answers and questions.
>>> On 8/29/19 12:01 PM, Erik Österlund wrote:
>>>> Hi Robbin,
>>>> Glad to see this work making it back again. Last time this patch 
>>>> was proposed, there were no guards against non-monotonic nmethod 
>>>> state transitions. Today there are (since 
>>>> https://bugs.openjdk.java.net/browse/JDK-8224674). In other words, 
>>>> if you call make_not_entrant() on an nmethod that has racingly 
>>>> become zombie, it is today impossible to resurrect that nmethod. 
>>>> Instead, make_not_entrant() will return false.
>>> Great, I'll have a look.
>>>> In particular, I think that is what the following code in the patch 
>>>> is guarding against (in codeCache.cpp):
>>>> 1150 // Mark methods for deopt (if safe or possible).
>>>> 1151 void CodeCache::mark_all_nmethods_for_deoptimization() {
>>>> 1152   MutexLocker mu(CodeCache_lock, 
>>>> Mutex::_no_safepoint_check_flag);
>>>> 1153   CompiledMethodIterator 
>>>> iter(CompiledMethodIterator::only_alive_and_not_unloading);
>>>> 1154   while(iter.next()) {
>>>> 1155     CompiledMethod* nm = iter.method();
>>>> 1156     if (!nm->method()->is_method_handle_intrinsic() &&
>>>> 1157         !nm->is_not_installed() &&
>>>> 1158         nm->is_in_use() &&
>>>> 1159         !nm->is_native_method()) {
>>>> 1160       // Intrinsics and native methods are never deopted. A 
>>>> method that is
>>>> 1161       // not installed yet or is not in use is not safe to 
>>>> deopt; the
>>>> 1162       // is_in_use() check covers the not_entrant and not 
>>>> zombie cases.
>>>> 1163       // Note: A not_entrant method can become a zombie at 
>>>> anytime if it was
>>>> 1164       // made not_entrant before the previous 
>>>> safepoint/handshake.
>>>> 1165       nm->mark_for_deoptimization();
>>>> 1166     }
>>>> 1167   }
>>>> 1168 }
>>>> In fact, this code is also guarding against is_not_installed 
>>>> nmethods (which was introduced as a new function). If we find that 
>>>> an nmethod is bad and say it needs deoptimization, and that nmethod 
>>>> is just about to get installed, it seems to me that skipping that 
>>>> nmethod and not making it not entrant is really dangerous and will 
>>>> eventually lead to a crash. Because in ciEnv.cpp we set_code to the 
>>>> about to be installed nmethod, and it will get called without 
>>>> anything stopping the invalid code from being executed by mutators, 
>>>> and it was being invalidated for a reason. So basically we really 
>>>> do have to make these not entrant, or we will blow up.
>>>> I guess the reason you needed this check is because make_in_use() 
>>>> doesn't check what the current state is, causing monotonicity 
>>>> problems, making a potentially already not entrant nmethod become 
>>>> in_use, eventually blowing up asserts and probably the VM. 
>>>> Regrettably, make_in_use() still does that, because I had no need 
>>>> for changing it to use nmethod::try_transition in 8224674, nobody 
>>>> previously could find these nmethods. In retrospect, not changing 
>>>> that was a bit silly. So if we just change make_in_use() to use 
>>>> nmethod::try_transition instead, then you can remove the 
>>>> !is_not_installed check... which I argue we have to do. The reason 
>>>> we never ran in to this before is because we made the nmethods not 
>>>> entrant in a safepoint, and between making an nmethod and making it 
>>>> in_use, there is no safepoint check, so they were never observed in 
>>>> this state.
>>> Yes, thread A is creating a new method X, it have state not installed.
>>> If we make method X not entrant, Thread A will flip it back to 
>>> in_use, which bad
>>> and we crash. mark_all_nmethods_for_deoptimization is a test method 
>>> used in the
>>> new jtreg test, which is the only use (I see dtrace have some hack 
>>> to use it, it
>>> can crash the vm).
>> Exactly. However, what you are missing here is that this does not 
>> happen only for the whitebox test, now that we do normal deopt with 
>> handshakes.
>> Consider the following race.
>> A compiler thread compiles a new nmethod. In this nmethod, it's 
>> assumed that a certain invoke_virtual callsite is completely 
>> monomorphic, and the compiler chooses to implement the callsite with 
>> a direct call, which is only valid given its monomorphism. The 
>> compilation finishes, and at the very end of the process, the code 
>> cache lock is grabbed, under which we allocate the new nmethod, 
>> verify the assumptions made (which still hold), inject entries in 
>> DependencyContexts so that if these assumptions change, we will 
>> deopt, and then unlock the code cache lock. The nmethod is now still 
>> in the state not_installed.
>> What can happen now is that another thread loads a class that makes 
>> the callsite potentially megamorphic. Deopt is triggered, walking the 
>> dependency contexts under the CodeCache_lock, marking things that are 
>> now bad. In this list we will find the newly compiled nmethod that 
>> has not been made in_use yet. It is marked for deoptimization. Then 
>> the Deoptimization::deoptimize_all_marked() function is called to 
>> make sure that we 1) make all the marked nmethods not_entrant, and 2) 
>> arm the bad activation records in the stack. Now the first step of 
>> the two grabs the code cache lock, and walks the code cache. It finds 
>> our about to be installed nmethod, and shoots make_not_entrant() at 
>> it, racing with the compiler thread calling make_in_use() on it.
>> This is why I advocate an approach where we simply make it safe to 
>> call make_in_use() racing with make_not_entrant(), instead of trying 
>> to chase down all possible scenarios in which this can happen. I can 
>> imagine a number of others, now that deoptimization is being done 
>> with handshakes.
>>>> ... and similarly here:
>>>> 1192     if (nm->is_marked_for_deoptimization() && nm->is_in_use()) {
>>>> 1193       // only_alive_and_not_unloading() can return not_entrant 
>>>> nmethods.
>>>> 1194       // A not_entrant method can become a zombie at anytime 
>>>> if it was
>>>> 1195       // made not_entrant before the previous 
>>>> safepoint/handshake. The
>>>> 1196       // is_in_use() check covers the not_entrant and not 
>>>> zombie cases
>>>> 1197       // that have become true after the method was marked for 
>>>> deopt.
>>>> 1198       nm->make_not_entrant();
>>>> 1199     }
>>>> ... we now don't need the guard against is_in_use() here.
>>> Here we don't need that, but we don't need !nm->is_not_entrant() 
>>> either, since
>>> it would return false as you say!?
>> Indeed. All state checks to see if it is even safe to call the state 
>> transition function should be removed, as it should always be safe.
>>> If we should have an early filter it think nm->is_in_use() is 
>>> clearer than:
>>> not not entrant ;) (since the not_installed is normally not seen here)
>> The transitioning functions themselves already have early checks, so 
>> there really is nothing to gain from checking if we should transition 
>> first and then doing it.
>>>> For what I propose to work out, we need to change 
>>>> nmethod::make_in_use to use nmethod::try_transition, and 
>>>> AOTCompiledMethod::make_entrant() needs to also return false when 
>>>> encountering a not_entrant nmethod, enforcing monotonicity, instead 
>>>> of asserting about that. AOT methods may turn not_in_use AOT 
>>>> methods to in_use, but not not_entrant to in_use. Once not_entrant, 
>>>> an AOT methods should remain not entrant forever.
>>> The only time we see methods which are not installed should be 
>>> methods that will
>>> never be deoptimize (except when calling the test method
>>> mark_all_nmethods_for_deoptimization), intrinsic and native.
>> That's not right. It's visible to a bunch of deoptimizations.
>>> And if I remember correctly make_in_use is called in a constructor 
>>> which
>>> requires extra error-handling. Construction would fail if 
>>> try_transition failed.
>>> The assumption about these 'never deopted' methods seem to be just 
>>> that :)
>> That is also not right. We grab the code cache lock, create the 
>> nmethod, verify dependencies, install dependencies, drop the code 
>> cache lock, all under not_installed. It's only after linking the 
>> Method to the code, that we eventually set it to in_use. So it's 
>> visible for longer than you thought. And it's visible to calls, a 
>> small instant before we change the state to in_use(). That's what I 
>> claim isn't right. We don't want to be able to call into these guys.
>>>> So basically, these guards are for something that used to be racy 
>>>> and dangerous due to the lack of guards inside of the state 
>>>> transitions, and today that is completely harmless, as the proper 
>>>> guards are in the attempts to change state. I would prefer to 
>>>> remove these guards, as it is confusing to the reader that we are 
>>>> guarding against a problem that can't happen. So I think you can 
>>>> remove the various checks for what state the nmethod is in, the 
>>>> comments describing races, and just keep the checks if it's a 
>>>> method handle intrinsic or native method.
>>> I'll revisit this. Was a few months since I did this.
>> Okay!
>>>> In deoptimize.cpp, the fetch_unroll_info_helper function has moved 
>>>> down, making it difficult to review as shows the code delta as if 
>>>> everything changed. So I can't see what actually changed there. :( 
>>>> Would you mind moving that code back so the webrev shows what 
>>>> changed there?
>>> I have inserted two new static functions before 
>>> fetch_unroll_info_helper,
>>> since code from fetch_unroll_info_helper are in these functions diff 
>>> do this.
>>> They are static function they are best off before, and putting after 
>>> them I
>>> don't think would help the diff tool + I need fwd decl.
>>> Suggestion?
>> Okay, I will try to work out what the actual diff is manually.
>> Thanks,
>> /Erik
>>> Thanks, Robbin
>>>> Thanks,
>>>> /Erik
>>>> On 2019-08-28 11:57, Robbin Ehn wrote:
>>>>> Hi, hotspot-dev was the intended list.
>>>>> Thanks, Robbin
>>>>> On 2019-08-28 09:30, Robbin Ehn wrote:
>>>>>> Hi all, please consider,
>>>>>> To get away from the issues of us revoking in the handshake, but 
>>>>>> before deopt
>>>>>> the locks get re-biased before we hit deopt handler, we instead 
>>>>>> revoke in deopt
>>>>>> handler.
>>>>>> The deopt handler have a JRT_BLOCK/_END which can safepoint so we 
>>>>>> revoke after
>>>>>> that but before we start looking at the monitors, with a 
>>>>>> NoSafepointVerifier.
>>>>>> Here is the previous changeset on top of jdk/jdk tip but due to 
>>>>>> merge conflicts
>>>>>> some pieces did not apply:
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-8221734-baseline/webrev/ 
>>>>>> So this is what was reviewed.
>>>>>> The rebase (merge conflict resolutions) and 8224795 bug fix :
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-rebase/webrev/
>>>>>> Bug 8224795 fix is here:
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-rebase/webrev/src/hotspot/share/code/nmethod.cpp.sdiff.html 
>>>>>> After this we have the same functionally as the reverted change-set.
>>>>>> Here is the changes for doing the revoke in deopt handler instead:
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-revoke-on-deopt/webrev/ 
>>>>>> This code was messy adding the deopt revocation in 3 places made 
>>>>>> it worse.
>>>>>> Here is a refactoring of that code. (also removed a dead method 
>>>>>> in biasedlocking):
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-refactor/webrev/
>>>>>> And here is full:
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/full/webrev/
>>>>>> Also a full squashed patch file:
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/full/full.patch
>>>>>> Latest test run I did t1-t6, Linux/Windows x64 have passed, 
>>>>>> MacOSX still
>>>>>> running.
>>>>>> Thanks, Robbin

More information about the hotspot-dev mailing list