RFR(L): 8185265 [MVT] improve performance of return of value types with new calling convention

Sergey Kuksenko sergey.kuksenko at oracle.com
Fri Aug 4 23:27:25 UTC 2017

I've checked that path. It definitely makes a huge improvement for MH 
startup time.

Unfortunately there is some modification which causes additional 
slowdown in startup (somewhere in the last week, will investigate later).

So some numbers,

1. time to performance of the single CountedLoop:
- was a week ago - 45 seconds
- current build  - 62 seconds
- current build + Roland's patch - 14 seconds

1. time to performance of the nest of 3 CountedLoop:
- was a week ago - 220 seconds
- current build  - ~400 seconds
- current build + Roland's patch - 70 seconds

On 08/04/2017 12:42 PM, Roland Westrelin wrote:
> (that patch includes some runtime/interpreter changes)
> http://cr.openjdk.java.net/~roland/8185265/webrev.00/
> JDK-8184795 disabled compilation of LFs as compilation root if they
> return a value type. The logic in CheckCastPPNode::Ideal() expects
> values to be returned in registers but if the called method is a lambda
> form, a value can be returned as a buffered value. I fixed the logic in
> CheckCastPPNode::Ideal() to properly handle return of a buffered value.
> One of Sergey's micro benchmarks takes forever to warmup. I found it's
> related to the return of value types and, in particular, calling the
> runtime to pack/unpack values (i.e. load fields in registers on return,
> allocate a buffered value and initialize it once returned to the
> caller).  It causes a ~10x slow down during warmup. To address this, the
> VM now generates little pieces of assembly code to perform the packing
> or unpacking. On return, unpacking a buffered value is performed by
> calling the value klass's unpack unhandler. When returned in the caller,
> a fast path allocation is attempted for a new value from the TLAB (I
> left off heap buffers out for now). If that allocation succeeds the
> value klass pack handler is called an initializes the value. If the
> allocation fails, we fall back to the same runtime call that is used
> currently (that calls knows there can be live oops, where to look for
> them and how to preserve them across a GC).
> C2 doesn't support calling to an address that is loaded in a register. I
> hacked support for it from the existing CallLeafNoFP node. It's somewhat
> ugly but I would say it's good enough for now.
> With this change, AFAICT, the warmup issue caused by return of value
> types in Sergey's test is gone. It doesn't mean it warmups fast (we
> still have some disabled compilation of LF for another reason) but it's
> now decent.
> Roland.

Best regards,
Sergey Kuksenko

More information about the valhalla-dev mailing list