RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents
goetz.lindenmaier at sap.com
Wed Jul 22 16:21:38 UTC 2020
Thanks for the quick reply.
> > > With DeoptimizeObjectsALot enabled internal threads are started that
> > > deoptimize frames and
> > > objects. The number of threads started are given with
> > > DeoptimizeObjectsALotThreadCountAll and
> > > DeoptimizeObjectsALotThreadCountSingle. The former targets all
> > > threads whereas the
> > > latter operates on a single thread selected round robin.
> > >
> > > I removed the mode where deoptimizations were performed at every nth
> > > exit from the runtime. I never used it.
> > Do I get it right? You have a n:1 and a n:all test scenario.
> > n:1: n threads deoptimize 1 Jana thread where n => DOALThreadCountSingle
> > n:m: n threads deoptimize all Java threads where n = DOALThreadCountAll?
> Not quite.
> -XX:+DeoptimizeObjectsALot // required
> Will start m+n threads. Each operating on all existing JavaThreads using
> EscapeBarriers. The
> difference between the 2 thread types is that one distinct EscapeBarrier
> targets either just a
> single thread or all exisitng threads at onece. If just one single thread is
> targeted per
> EscapeBarrier, then it is not always the same thread, but threads are selected
> round robin. So there
> will be n threads selecting independently single threads round robin per
> EscapeBarrier and m threads
> that target all threads in every EscapeBarrier.
Ok, yes, that is how I understood it.
> > > * EscapeBarrier::sync_and_suspend_one(): use a direct handshake and
> > > execute it always independently
> > > of is_thread_fully_suspended().
> > Is this also a performance optimization?
> Maybe a minor one.
> > > * JavaThread::wait_for_object_deoptimization():
> > > - Bugfix: the last check of is_obj_deopt_suspend() must be /after/ the
> > > safepoint check! This
> > > caused issues with not walkable stacks with DeoptimizeObjectsALot.
> > OK. As I understand, there was one safepoint check in the old version,
> > now there is one in each iteration. I assume this is intended, right?
> Yes it is. The important thing here is (A) a safepoint check is needed /after/
> leaving a safe state
> (_thread_in_native, _thread_blocked). (B) Shared variables that are modified
> at safepoints or with handshakes need to be reread /after/ the safepoint check.
> BTW: I only noticed now that since JDK-8240918 JavaThreads themselves
> must disarm their polling
> page. Originally (before handshakes) this was done by the VM thread. With
> handshakes it was done by
> the thread executing the handshake op. This was changed for
> OrderAccess::cross_modify_fence() where
> the poll is left armed if the thread is in native and sice JDK-8240918 it is
> always left armed. So
> when a thread leaves a safe state (native, blocked) and there was a
> handshake/vm op, it will always
> call SafepointMechanism::block_if_requested_slow(), even if the
> handshake/vm operation have been
> processed already and everybody else is happyly executing bytecodes :)
> Still (A) and (B) hold.
> > > - Added limited spinning inspired by HandshakeSpinYield to fix regression in
> > > microbenchmark 
> > Ok. Nice improvement, nice catch!
> Yes. It certainly took some time to find out.
> > >
> > > I refer to some more changes answering your questions and comments
> > > below.
> > >
> > > Thanks,
> > > Richard.
> > >
> > >  Microbenchmark:
> > >
> > >
> > > > I understand you annotate at safepoints where the escape analysis
> > > > finds out that an object is "better" than global escape.
> > > > This are the cases where the analysis identifies optimization
> > > > opportunities. These annotations are then used to deoptimize
> > > > frames and the objects referenced by them.
> > > > Doesn't this overestimate the optimized
> > > > objects? E.g., eliminate_alloc_node has many cases where it bails
> > > > out.
> > >
> > > Yes, the implementation is conservative, but it is comparatively simple
> > > the additional debug
> > > info is just 2 flags per safepoint.
> > Thanks. It also helped that you explained to me offline that
> > there are more optimizations than only lock elimination and scalar
> > replacement done based on the ea information.
> > The ea refines the IR graph with allows follow up optimizations
> > which can not easily be tracked back to the escaping objects or
> > the call sites where they do not escape.
> > Thus, if there are non-global escaping objects, you have to
> > deoptimize the frame.
> > Did I repeat that correctly?
> Mostly, but there are also cases where deoptimization is required if and only
> if ea-local objects
> are passed as arguments. This is the case when values are not read directly
> from a frame, but from a callee frame.
Hmm, don't get this completely, but ok.
> > > Accesses to instance
> > > members or array elements can be optimized as well.
> > You mean the compiler can/will ignore volatile or memory ordering
> > requirements for non-escaping objects? Sounds reasonable to do.
> Yes, for instance. Also without volatile modifiers it will eliminate accesses.
> Here is an example:
> Method A has a NoEscape allocation O that is not scalar replaced. A calls
> Method B, which is not
> inlined. When you use your debugger to break in B, then modify a field of O,
> then this modification
> would have no effect without deoptimization, because the jit assumes that B
> cannot modify O without
> a reference to it.
Yes, A can keep O in a register, while the JVMTI thread would write to
the location in the stack where the local is held (if it was written back).
> > > > Syncronization: looks good. I think others had a look at this before.
> > > >
> > > > EscapeBarrier::deoptimize_objects_internal()
> > > > The method name is misleading, it is not used by
> > > > deoptimize_objects().
> > > > Also, method with the same name is in Deopitmization.
> > > > Proposal: deoptimize_objects_thread() ?
> > >
> > > Sorry, but I don't see, why it would be misleading.
> > > What would be the meaning of 'deoptimize_objects_thread'? I don't
> > > understand that name.
> > 1. I have no idea why it's called "_internal". Because it is private?
> > By the name, I would expect that EscapeBarrier::deoptimize_objects()
> > calls it for some internal tasks. But it does not.
> Well, I'd say it is pretty internal, what's happening in that method. So IMHO
> the suffix _internal
> is a match.
> > 2. My proposal: deoptimize_objects_all_threads() iterates all threads
> > and calls deoptimize_objects(_one)_thread(thread) for each of these.
> > That's how I would have named it.
> > But no bike shedding, if you don't see what I mean it's not obvious.
> Ok. We could have a quick call, too, if you like.
Ok, I think I have understood the remaining points. I'm fine with this
More information about the hotspot-runtime-dev