RFR(L): 8226705: [REDO] Deoptimize with handshakes

Robbin Ehn robbin.ehn at oracle.com
Wed Sep 11 12:27:28 UTC 2019

Hi Patricio,

On 2019-09-11 08:07, Patricio Chilano wrote:
> Hi Robbin,
> Thanks for looking into this and for making fetch_unroll_info_helper() more 
> clear to read!
> I looked again at the biased locking part only since I don't really know the 
> rest of code. Those look good to me, just some comments:
> src/hotspot/share/runtime/deoptimization.cpp:
> 1) What was the problem of calling revoke_from_deopt_handler() before 
> eliminate_allocations()?

JRT_BLOCK can safepoint, which would leads us back to the original problem.

> 2) If COMPILER2_OR_JVMCI is 0, shouldn't we still call revoke_from_deopt_handler()?

Yes, thanks, fixed!

> src/hotspot/share/runtime/biasedLocking.cpp:
> L737: markWord prototype_header = k->prototype_header();
> Although that will probably get optimized away in product code, I would just 
> place that load inside the assert.


> src/hotspot/share/runtime/biasedLocking.xpp:
> Maybe s/revoke_own_locks()/revoke_own_lock() ?


> Regarding InterpreterRuntime::frequency_counter_overflow_inner() having possibly 
> the same bug, not sure I see the trace for that.

After your rebaised patch, things have changed a bit, so it might be good now. 
(revoke do not safepoint)

I'll post a v3 to RFR mail.

Thanks, Robbin

> Thanks!
> Patricio
> On 9/9/19 9:45 AM, Robbin Ehn wrote:
>> Hi all,
>> Me and Erik had some offline discussions, which ended up in this incremental 
>> change:
>> http://cr.openjdk.java.net/~rehn/8226705/v2/inc/webrev/
>> Full:
>> http://cr.openjdk.java.net/~rehn/8226705/v2/full/webrev/
>> The problem is a nmethods state must never go backwards.
>> When the methods code is set to the compiled code it can be made not_entrant.
>> (it can be set to not_entrant before not having Compiled_lock)
>> When unlinking the methods code must point to the nmethod.
>> If we set code before changing state, it can go backwards.
>> If we set state before setting code, we can't unlink it.
>> This solution uses the new CompiledMethod_lock to synchronize code and state 
>> changes. (unloaded don't need the lock since it require a safepoint to happen)
>> But this resulted in a circular lock issue with OsrList_lock.
>> Therefore OsrList_lock is removed, instead we use CompiledMethod_lock.
>> After this patch it should be possible to remove Compile_lock from some paths. 
>> (Note that JVMCI is missing Compile_lock from several pathes, but fewer with 
>> this changeset)
>> Also InterpreterRuntime::frequency_counter_overflow_inner seem to have the 
>> same bug as the reason for this redo: BiasedLocking::revoke(objects_to_revoke, 
>> thread);
>> The objects could have been rebaised by the time we return from it.
>> I think we should call the new 
>> BiasedLocking::revoke_own_locks(objects_to_revoke->at(i), thread); here also.
>> Passes t1-5 twice and KS 60 min.
>> @Erik I stayed with current is_in_use (not_installed + in_use), there were to 
>> many places and many of them needed an equivalent method if is_in_use was to 
>> be changed (to only in_use). I didn't see any issue with using the current 
>> definition.
>> Thanks, Robbin
>> On 8/29/19 4:35 PM, Erik Österlund wrote:
>>> Hi Robbin,
>>> On 2019-08-29 14:25, Robbin Ehn wrote:
>>>> Hi Erik, thanks for having a look.
>>>> Just some short answers and questions.
>>>> On 8/29/19 12:01 PM, Erik Österlund wrote:
>>>>> Hi Robbin,
>>>>> Glad to see this work making it back again. Last time this patch was 
>>>>> proposed, there were no guards against non-monotonic nmethod state 
>>>>> transitions. Today there are (since 
>>>>> https://bugs.openjdk.java.net/browse/JDK-8224674). In other words, if you 
>>>>> call make_not_entrant() on an nmethod that has racingly become zombie, it 
>>>>> is today impossible to resurrect that nmethod. Instead, make_not_entrant() 
>>>>> will return false.
>>>> Great, I'll have a look.
>>>>> In particular, I think that is what the following code in the patch is 
>>>>> guarding against (in codeCache.cpp):
>>>>> 1150 // Mark methods for deopt (if safe or possible).
>>>>> 1151 void CodeCache::mark_all_nmethods_for_deoptimization() {
>>>>> 1152   MutexLocker mu(CodeCache_lock, Mutex::_no_safepoint_check_flag);
>>>>> 1153   CompiledMethodIterator 
>>>>> iter(CompiledMethodIterator::only_alive_and_not_unloading);
>>>>> 1154   while(iter.next()) {
>>>>> 1155     CompiledMethod* nm = iter.method();
>>>>> 1156     if (!nm->method()->is_method_handle_intrinsic() &&
>>>>> 1157         !nm->is_not_installed() &&
>>>>> 1158         nm->is_in_use() &&
>>>>> 1159         !nm->is_native_method()) {
>>>>> 1160       // Intrinsics and native methods are never deopted. A method 
>>>>> that is
>>>>> 1161       // not installed yet or is not in use is not safe to deopt; the
>>>>> 1162       // is_in_use() check covers the not_entrant and not zombie cases.
>>>>> 1163       // Note: A not_entrant method can become a zombie at anytime if 
>>>>> it was
>>>>> 1164       // made not_entrant before the previous safepoint/handshake.
>>>>> 1165       nm->mark_for_deoptimization();
>>>>> 1166     }
>>>>> 1167   }
>>>>> 1168 }
>>>>> In fact, this code is also guarding against is_not_installed nmethods 
>>>>> (which was introduced as a new function). If we find that an nmethod is bad 
>>>>> and say it needs deoptimization, and that nmethod is just about to get 
>>>>> installed, it seems to me that skipping that nmethod and not making it not 
>>>>> entrant is really dangerous and will eventually lead to a crash. Because in 
>>>>> ciEnv.cpp we set_code to the about to be installed nmethod, and it will get 
>>>>> called without anything stopping the invalid code from being executed by 
>>>>> mutators, and it was being invalidated for a reason. So basically we really 
>>>>> do have to make these not entrant, or we will blow up.
>>>>> I guess the reason you needed this check is because make_in_use() doesn't 
>>>>> check what the current state is, causing monotonicity problems, making a 
>>>>> potentially already not entrant nmethod become in_use, eventually blowing 
>>>>> up asserts and probably the VM. Regrettably, make_in_use() still does that, 
>>>>> because I had no need for changing it to use nmethod::try_transition in 
>>>>> 8224674, nobody previously could find these nmethods. In retrospect, not 
>>>>> changing that was a bit silly. So if we just change make_in_use() to use 
>>>>> nmethod::try_transition instead, then you can remove the !is_not_installed 
>>>>> check... which I argue we have to do. The reason we never ran in to this 
>>>>> before is because we made the nmethods not entrant in a safepoint, and 
>>>>> between making an nmethod and making it in_use, there is no safepoint 
>>>>> check, so they were never observed in this state.
>>>> Yes, thread A is creating a new method X, it have state not installed.
>>>> If we make method X not entrant, Thread A will flip it back to in_use, which 
>>>> bad
>>>> and we crash. mark_all_nmethods_for_deoptimization is a test method used in the
>>>> new jtreg test, which is the only use (I see dtrace have some hack to use 
>>>> it, it
>>>> can crash the vm).
>>> Exactly. However, what you are missing here is that this does not happen only 
>>> for the whitebox test, now that we do normal deopt with handshakes.
>>> Consider the following race.
>>> A compiler thread compiles a new nmethod. In this nmethod, it's assumed that 
>>> a certain invoke_virtual callsite is completely monomorphic, and the compiler 
>>> chooses to implement the callsite with a direct call, which is only valid 
>>> given its monomorphism. The compilation finishes, and at the very end of the 
>>> process, the code cache lock is grabbed, under which we allocate the new 
>>> nmethod, verify the assumptions made (which still hold), inject entries in 
>>> DependencyContexts so that if these assumptions change, we will deopt, and 
>>> then unlock the code cache lock. The nmethod is now still in the state 
>>> not_installed.
>>> What can happen now is that another thread loads a class that makes the 
>>> callsite potentially megamorphic. Deopt is triggered, walking the dependency 
>>> contexts under the CodeCache_lock, marking things that are now bad. In this 
>>> list we will find the newly compiled nmethod that has not been made in_use 
>>> yet. It is marked for deoptimization. Then the 
>>> Deoptimization::deoptimize_all_marked() function is called to make sure that 
>>> we 1) make all the marked nmethods not_entrant, and 2) arm the bad activation 
>>> records in the stack. Now the first step of the two grabs the code cache 
>>> lock, and walks the code cache. It finds our about to be installed nmethod, 
>>> and shoots make_not_entrant() at it, racing with the compiler thread calling 
>>> make_in_use() on it.
>>> This is why I advocate an approach where we simply make it safe to call 
>>> make_in_use() racing with make_not_entrant(), instead of trying to chase down 
>>> all possible scenarios in which this can happen. I can imagine a number of 
>>> others, now that deoptimization is being done with handshakes.
>>>>> ... and similarly here:
>>>>> 1192     if (nm->is_marked_for_deoptimization() && nm->is_in_use()) {
>>>>> 1193       // only_alive_and_not_unloading() can return not_entrant nmethods.
>>>>> 1194       // A not_entrant method can become a zombie at anytime if it was
>>>>> 1195       // made not_entrant before the previous safepoint/handshake. The
>>>>> 1196       // is_in_use() check covers the not_entrant and not zombie cases
>>>>> 1197       // that have become true after the method was marked for deopt.
>>>>> 1198       nm->make_not_entrant();
>>>>> 1199     }
>>>>> ... we now don't need the guard against is_in_use() here.
>>>> Here we don't need that, but we don't need !nm->is_not_entrant() either, since
>>>> it would return false as you say!?
>>> Indeed. All state checks to see if it is even safe to call the state 
>>> transition function should be removed, as it should always be safe.
>>>> If we should have an early filter it think nm->is_in_use() is clearer than:
>>>> not not entrant ;) (since the not_installed is normally not seen here)
>>> The transitioning functions themselves already have early checks, so there 
>>> really is nothing to gain from checking if we should transition first and 
>>> then doing it.
>>>>> For what I propose to work out, we need to change nmethod::make_in_use to 
>>>>> use nmethod::try_transition, and AOTCompiledMethod::make_entrant() needs to 
>>>>> also return false when encountering a not_entrant nmethod, enforcing 
>>>>> monotonicity, instead of asserting about that. AOT methods may turn 
>>>>> not_in_use AOT methods to in_use, but not not_entrant to in_use. Once 
>>>>> not_entrant, an AOT methods should remain not entrant forever.
>>>> The only time we see methods which are not installed should be methods that 
>>>> will
>>>> never be deoptimize (except when calling the test method
>>>> mark_all_nmethods_for_deoptimization), intrinsic and native.
>>> That's not right. It's visible to a bunch of deoptimizations.
>>>> And if I remember correctly make_in_use is called in a constructor which
>>>> requires extra error-handling. Construction would fail if try_transition 
>>>> failed.
>>>> The assumption about these 'never deopted' methods seem to be just that :)
>>> That is also not right. We grab the code cache lock, create the nmethod, 
>>> verify dependencies, install dependencies, drop the code cache lock, all 
>>> under not_installed. It's only after linking the Method to the code, that we 
>>> eventually set it to in_use. So it's visible for longer than you thought. And 
>>> it's visible to calls, a small instant before we change the state to 
>>> in_use(). That's what I claim isn't right. We don't want to be able to call 
>>> into these guys.
>>>>> So basically, these guards are for something that used to be racy and 
>>>>> dangerous due to the lack of guards inside of the state transitions, and 
>>>>> today that is completely harmless, as the proper guards are in the attempts 
>>>>> to change state. I would prefer to remove these guards, as it is confusing 
>>>>> to the reader that we are guarding against a problem that can't happen. So 
>>>>> I think you can remove the various checks for what state the nmethod is in, 
>>>>> the comments describing races, and just keep the checks if it's a method 
>>>>> handle intrinsic or native method.
>>>> I'll revisit this. Was a few months since I did this.
>>> Okay!
>>>>> In deoptimize.cpp, the fetch_unroll_info_helper function has moved down, 
>>>>> making it difficult to review as shows the code delta as if everything 
>>>>> changed. So I can't see what actually changed there. :( Would you mind 
>>>>> moving that code back so the webrev shows what changed there?
>>>> I have inserted two new static functions before fetch_unroll_info_helper,
>>>> since code from fetch_unroll_info_helper are in these functions diff do this.
>>>> They are static function they are best off before, and putting after them I
>>>> don't think would help the diff tool + I need fwd decl.
>>>> Suggestion?
>>> Okay, I will try to work out what the actual diff is manually.
>>> Thanks,
>>> /Erik
>>>> Thanks, Robbin
>>>>> Thanks,
>>>>> /Erik
>>>>> On 2019-08-28 11:57, Robbin Ehn wrote:
>>>>>> Hi, hotspot-dev was the intended list.
>>>>>> Thanks, Robbin
>>>>>> On 2019-08-28 09:30, Robbin Ehn wrote:
>>>>>>> Hi all, please consider,
>>>>>>> To get away from the issues of us revoking in the handshake, but before 
>>>>>>> deopt
>>>>>>> the locks get re-biased before we hit deopt handler, we instead revoke in 
>>>>>>> deopt
>>>>>>> handler.
>>>>>>> The deopt handler have a JRT_BLOCK/_END which can safepoint so we revoke 
>>>>>>> after
>>>>>>> that but before we start looking at the monitors, with a 
>>>>>>> NoSafepointVerifier.
>>>>>>> Here is the previous changeset on top of jdk/jdk tip but due to merge 
>>>>>>> conflicts
>>>>>>> some pieces did not apply:
>>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-8221734-baseline/webrev/
>>>>>>> So this is what was reviewed.
>>>>>>> The rebase (merge conflict resolutions) and 8224795 bug fix :
>>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-rebase/webrev/
>>>>>>> Bug 8224795 fix is here:
>>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-rebase/webrev/src/hotspot/share/code/nmethod.cpp.sdiff.html 
>>>>>>> After this we have the same functionally as the reverted change-set.
>>>>>>> Here is the changes for doing the revoke in deopt handler instead:
>>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-revoke-on-deopt/webrev/
>>>>>>> This code was messy adding the deopt revocation in 3 places made it worse.
>>>>>>> Here is a refactoring of that code. (also removed a dead method in 
>>>>>>> biasedlocking):
>>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-refactor/webrev/
>>>>>>> And here is full:
>>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/full/webrev/
>>>>>>> Also a full squashed patch file:
>>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/full/full.patch
>>>>>>> Latest test run I did t1-t6, Linux/Windows x64 have passed, MacOSX still
>>>>>>> running.
>>>>>>> Thanks, Robbin

More information about the hotspot-dev mailing list