RFR(L): 8226705: [REDO] Deoptimize with handshakes

Robbin Ehn robbin.ehn at oracle.com
Thu Aug 29 12:25:45 UTC 2019

Hi Erik, thanks for having a look.

Just some short answers and questions.

On 8/29/19 12:01 PM, Erik Österlund wrote:
> Hi Robbin,
> Glad to see this work making it back again. Last time this patch was proposed, 
> there were no guards against non-monotonic nmethod state transitions. Today 
> there are (since https://bugs.openjdk.java.net/browse/JDK-8224674). In other 
> words, if you call make_not_entrant() on an nmethod that has racingly become 
> zombie, it is today impossible to resurrect that nmethod. Instead, 
> make_not_entrant() will return false.

Great, I'll have a look.

> In particular, I think that is what the following code in the patch is guarding 
> against (in codeCache.cpp):
> 1150 // Mark methods for deopt (if safe or possible).
> 1151 void CodeCache::mark_all_nmethods_for_deoptimization() {
> 1152   MutexLocker mu(CodeCache_lock, Mutex::_no_safepoint_check_flag);
> 1153   CompiledMethodIterator 
> iter(CompiledMethodIterator::only_alive_and_not_unloading);
> 1154   while(iter.next()) {
> 1155     CompiledMethod* nm = iter.method();
> 1156     if (!nm->method()->is_method_handle_intrinsic() &&
> 1157         !nm->is_not_installed() &&
> 1158         nm->is_in_use() &&
> 1159         !nm->is_native_method()) {
> 1160       // Intrinsics and native methods are never deopted. A method that is
> 1161       // not installed yet or is not in use is not safe to deopt; the
> 1162       // is_in_use() check covers the not_entrant and not zombie cases.
> 1163       // Note: A not_entrant method can become a zombie at anytime if it was
> 1164       // made not_entrant before the previous safepoint/handshake.
> 1165       nm->mark_for_deoptimization();
> 1166     }
> 1167   }
> 1168 }
> In fact, this code is also guarding against is_not_installed nmethods (which was 
> introduced as a new function). If we find that an nmethod is bad and say it 
> needs deoptimization, and that nmethod is just about to get installed, it seems 
> to me that skipping that nmethod and not making it not entrant is really 
> dangerous and will eventually lead to a crash. Because in ciEnv.cpp we set_code 
> to the about to be installed nmethod, and it will get called without anything 
> stopping the invalid code from being executed by mutators, and it was being 
> invalidated for a reason. So basically we really do have to make these not 
> entrant, or we will blow up.
> I guess the reason you needed this check is because make_in_use() doesn't check 
> what the current state is, causing monotonicity problems, making a potentially 
> already not entrant nmethod become in_use, eventually blowing up asserts and 
> probably the VM. Regrettably, make_in_use() still does that, because I had no 
> need for changing it to use nmethod::try_transition in 8224674, nobody 
> previously could find these nmethods. In retrospect, not changing that was a bit 
> silly. So if we just change make_in_use() to use nmethod::try_transition 
> instead, then you can remove the !is_not_installed check... which I argue we 
> have to do. The reason we never ran in to this before is because we made the 
> nmethods not entrant in a safepoint, and between making an nmethod and making it 
> in_use, there is no safepoint check, so they were never observed in this state.

Yes, thread A is creating a new method X, it have state not installed.
If we make method X not entrant, Thread A will flip it back to in_use, which bad
and we crash. mark_all_nmethods_for_deoptimization is a test method used in the
new jtreg test, which is the only use (I see dtrace have some hack to use it, it
can crash the vm).

> ... and similarly here:
> 1192     if (nm->is_marked_for_deoptimization() && nm->is_in_use()) {
> 1193       // only_alive_and_not_unloading() can return not_entrant nmethods.
> 1194       // A not_entrant method can become a zombie at anytime if it was
> 1195       // made not_entrant before the previous safepoint/handshake. The
> 1196       // is_in_use() check covers the not_entrant and not zombie cases
> 1197       // that have become true after the method was marked for deopt.
> 1198       nm->make_not_entrant();
> 1199     }
> ... we now don't need the guard against is_in_use() here.

Here we don't need that, but we don't need !nm->is_not_entrant() either, since
it would return false as you say!?
If we should have an early filter it think nm->is_in_use() is clearer than:
not not entrant ;) (since the not_installed is normally not seen here)

> For what I propose to work out, we need to change nmethod::make_in_use to use 
> nmethod::try_transition, and AOTCompiledMethod::make_entrant() needs to also 
> return false when encountering a not_entrant nmethod, enforcing monotonicity, 
> instead of asserting about that. AOT methods may turn not_in_use AOT methods to 
> in_use, but not not_entrant to in_use. Once not_entrant, an AOT methods should 
> remain not entrant forever.

The only time we see methods which are not installed should be methods that will
never be deoptimize (except when calling the test method
mark_all_nmethods_for_deoptimization), intrinsic and native.
And if I remember correctly make_in_use is called in a constructor which
requires extra error-handling. Construction would fail if try_transition failed.
The assumption about these 'never deopted' methods seem to be just that :)

> So basically, these guards are for something that used to be racy and dangerous 
> due to the lack of guards inside of the state transitions, and today that is 
> completely harmless, as the proper guards are in the attempts to change state. I 
> would prefer to remove these guards, as it is confusing to the reader that we 
> are guarding against a problem that can't happen. So I think you can remove the 
> various checks for what state the nmethod is in, the comments describing races, 
> and just keep the checks if it's a method handle intrinsic or native method.

I'll revisit this. Was a few months since I did this.

> In deoptimize.cpp, the fetch_unroll_info_helper function has moved down, making 
> it difficult to review as shows the code delta as if everything changed. So I 
> can't see what actually changed there. :( Would you mind moving that code back 
> so the webrev shows what changed there?

I have inserted two new static functions before fetch_unroll_info_helper,
since code from fetch_unroll_info_helper are in these functions diff do this.
They are static function they are best off before, and putting after them I
don't think would help the diff tool + I need fwd decl.

Thanks, Robbin

> Thanks,
> /Erik
> On 2019-08-28 11:57, Robbin Ehn wrote:
>> Hi, hotspot-dev was the intended list.
>> Thanks, Robbin
>> On 2019-08-28 09:30, Robbin Ehn wrote:
>>> Hi all, please consider,
>>> To get away from the issues of us revoking in the handshake, but before deopt
>>> the locks get re-biased before we hit deopt handler, we instead revoke in deopt
>>> handler.
>>> The deopt handler have a JRT_BLOCK/_END which can safepoint so we revoke after
>>> that but before we start looking at the monitors, with a NoSafepointVerifier.
>>> Here is the previous changeset on top of jdk/jdk tip but due to merge conflicts
>>> some pieces did not apply:
>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-8221734-baseline/webrev/
>>> So this is what was reviewed.
>>> The rebase (merge conflict resolutions) and 8224795 bug fix :
>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-rebase/webrev/
>>> Bug 8224795 fix is here:
>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-rebase/webrev/src/hotspot/share/code/nmethod.cpp.sdiff.html 
>>> After this we have the same functionally as the reverted change-set.
>>> Here is the changes for doing the revoke in deopt handler instead:
>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-revoke-on-deopt/webrev/
>>> This code was messy adding the deopt revocation in 3 places made it worse.
>>> Here is a refactoring of that code. (also removed a dead method in 
>>> biasedlocking):
>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-refactor/webrev/
>>> And here is full:
>>> http://cr.openjdk.java.net/~rehn/8226705/v1/full/webrev/
>>> Also a full squashed patch file:
>>> http://cr.openjdk.java.net/~rehn/8226705/v1/full/full.patch
>>> Latest test run I did t1-t6, Linux/Windows x64 have passed, MacOSX still
>>> running.
>>> Thanks, Robbin

More information about the hotspot-runtime-dev mailing list