RFR(XL): 8185640: Thread-local handshakes

Hohensee, Paul hohensee at amazon.com
Thu Oct 26 23:54:35 UTC 2017

As a reference point, Android Java branches on a flag in the TLS rather than issuing a poisoned page probe. On x86 at least, there’s no performance disadvantage: branch prediction makes the compare-and-branch pair a single-cycle operation in the vast majority of cases.

The interpreter was built at a time when branches had non-zero cost, as evidenced by the prediction bits in the sparc64 predicted branch instructions. The compare-and-branch code sequence takes up icache space in the interpreter (vs. zero for switching the dispatch table) and icache is still a limited resource on modern processors, so that’s an argument for switching dispatch tables. For compiled code, compare-and-branch takes a bit more space than the current poison page probe, but not enough to matter imo. Compiled code is executed far more than interpreter code, so I’d go with optimizing compiled code performance.



On 10/26/17, 10:20 AM, "hotspot-dev on behalf of Andrew Haley" <hotspot-dev-bounces at openjdk.java.net on behalf of aph at redhat.com> wrote:

    On 26/10/17 18:00, Erik Osterlund wrote:
    > Hi Andrew,
    >> On 26 Oct 2017, at 18:05, Andrew Haley <aph at redhat.com> wrote:
    >>> On 26/10/17 15:39, Erik Österlund wrote:
    >>> The reason we do not poll the page in the interpreter is that we
    >>> need to generate appropriate relocation entries in the code blob for
    >>> the PCs that we poll on, so that we in the signal handler can look
    >>> up the code blob, walk the relocation entries, and find precisely
    >>> why we got the trap, i.e. due to the poll, and precisely what kind
    >>> of poll, so we know what trampoline needs to be taken into the
    >>> runtime.
    >> Not really, no.  If we know that we're in the interpreter and the
    >> faulting address is the safepoint poll, then we can read all of the
    >> context we need from the interpreter registers and the frame.
    > That sounds like what I said.
    Not exactly.  We do not need to generate any more relocation entries.
    > But the cost of the conditional branch is empirically (this was
    > attempted and measured a while ago) approximately the same as the
    > indirect load during "normal circumstances". The indirect load was
    > only marginally better.
    That's interesting.  The cost of the SEGV trap going through the
    kernel is fairly high, and I'm now wondering if, for very fast
    safepoint responses, we'd be better off not doing it.  The cost of the
    write protect, given that it probably involves an IPI on all
    processors, isn't cheap either.
    >>> While constructing something that does that is indeed possible, it
    >>> simply did not seem worth the trouble compared to using a branch in
    >>> these paths. The same reasoning applies for the poll performed in
    >>> the native wrapper when waking up from native and transitioning into
    >>> Java. It performs a conditional branch instead of indirect load to
    >>> avoid signal handler logic for polls that are not performance
    >>> critical.
    >> If we're talking about performance, the existing bytecode interpreter
    >> is exquisitely carefully coded, even going to the extent of having
    >> multiple dispatch tables for safepoint- and non-safepoint cases.
    >> Clearly the original authors weren't thinking that code was not
    >> performance critical or they wouldn't have done what they did.  I
    >> suppose, though, that the design we have is from the early days when
    >> people diligently strove to make the interpreter as fast as possible.
    > On the other hand, branches have become a lot faster in "recent"
    > years, and this one is particularly trivial to predict. Therefore I
    > prefer to base design decisions on empirical measurements. And
    > introducing that complexity for an close to insignificantly faster
    > interpreter poll does not seem encouraging to me. Do you agree?
    Perhaps.  It's interesting that the result falls one way in compiled
    code and the other in interpreted code.  If the choice is so very
    finely balanced, though, it sort-of makes sense.
    Andrew Haley
    Java Platform Lead Engineer
    Red Hat UK Ltd. <https://www.redhat.com>
    EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

More information about the hotspot-dev mailing list