[foreign] RFR 8210757: Add binder support for direct native invocation strategy

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Fri Sep 14 18:04:48 UTC 2018

as mentioned in [1], this patch adds binder support for the so called 
'direct' invocation scheme, which allows for greater native invocation 
downcall/upcall performances by means of specialized adapters. The core 
idea, also described in [1], is to define adapters of the kind:


Where long arguments come before double arguments (and do this for each 
arity e.g. <=5).

If all arguments are passed in register, then this reordering doesn't 
affect behavior, and greatly limits the number of permutations to be 

The downcall part (java to native) is relative straightforward: the 
directNativeInvoker.cpp file defines a bunch of native entry points, one 
per shape, which cast the input address to a function pointer of the 
desired shape, and then call it:

jlong NI_invokeNative_J_JD(JNIEnv *env, jobject _unused, jlong addr, 
jlong arg0, jdouble arg1) {
     return ((jlong (*)(jlong, jdouble))addr)(arg0, arg1);

The upcall business is a little trickier: first, if we are only to 
optimize upcalls where argument passing happens in registers, then it's 
crucial to note that by the time we get into the assembly stub, all the 
registers will have been populated by the native code to contain the 
right arguments in the right places. So we can avoid all the shuffling 
in the assembly adapter and simply jump onto a C function that looks 
like this:

long specialized_upcall_helper_J(long l0, long l1, long l2, long l3,
                                       double d0, double d1, double d2, 
double d3,
                                        unsigned int mask, jobject rec) 
{ ... }

Note here that the first 8 arguments are just longs and doubles, and 
those will be expected to be in registers, according to the System V 
ABI. (In windows, the situation will be a bit different as less integer 
registers are available, so this will need some work there).

So, to recap, the assembly upcall stub simply 'append' the receiver 
object and a 'signature mask' in the last two available C registers and 
then jump onto the helper function. The helper function will find all 
the desired arguments in the right places - there will be, in the 
general case, some unused arguments, but that's fine, after all it 
didn't cost anything to us to load them in the first place!

Note that we have three helper variants, one for each return type { 
long, double, void }. This is required as we need the C helper to return 
a value of the right type which will generate the right assembly 
sequence to store the result in the right register (either integer or MMX).

So, with three helpers we can support all the shapes with up to 8 
arguments. On the Java side we have, of course, to define a specialized 
entry point for each shape.

All the magic for adapting method handle to and from the specialized 
adapters happen in the DirectSignatureShuffler class; this class is 
responsible for adapting each argument e.g. from Java to native value, 
and then reordering the adapted method handle to match the order in 
which arguments are expected by the adapter (e.g. move all longs in 
front). The challenge was in having DirectSignatureShuffle to be fully 
symmetric - e.g. I did not want to have different code paths for upcalls 
and downcalls, so the code tries quite hard to be parametric in the 
shuffling direction (java->native or native->java) - which means that 
adapters will be applied in one way or in the inverse way depending on 
the shuffling direction (and as to whether we are adapting an argument 
or a return). Since method handle filters are composable, it all works 
out quite beautifully.

Note that the resulting, adapted MH is stored in a @Stable field to tell 
the JIT to optimize the heck out of it (as if it were a static constant).

This patch contains several other changes - which I discuss briefly below:

* we need to setup a framework in which new invocation strategies can be 
plugged in - note that we now have essentially 4 cases:

{ NativeInvoker, UpcallHandler } x { Universal, Direct }

When the code wants e.g. a NativeInvoker, it asks for one to the 
NativeInvoker::of factory (UpcallHandler work in a similar way); this 
factory will attempt to go down the fast path - if an error occurs when 
computing the fast path, the call will fallback to the universal (slow) 

Most of the changes you see in the Java code are associated to this 
refactoring - e.g. all clients of NativeInvoker/UpcallHandler should now 
go through the factory

* CallbackImplGenerator had a major issue since the new factory for 
NativeInvoker wants to bind an address eagerly (this is required e.g. to 
be forward compatible with linkToNative backend); which means that at 
construction time we have to get the address of the callback, call the 
NativeInvoker factory and then stash the target method handle into a 
field of the anon callback class. Vlad tells me that fields of anon 
classes are always 'trusted' by the JIT, which means they should be 
treated as '@Stable' (note that I can't put a @Stable annotation there, 
since this code will be spinned in user-land).

* There are a bunch of properties that can be set to either force slow 
path or force 'direct' path; in the latter case, if an error occurs when 
instantiating the direct wrapper, an exception is thrown. This mode is 
very useful for testing, and I indeed have tried to run all our tests 
with this flag enabled, to see how many places could not be optimized.

* I've also reorganized all the native code in hotspot/prims so that we 
have a separate file for each scheme (and so that native Java methods 
could be added where they really belong). This should also help in the 
long run as it should make adding/removing a given scheme easier.

* I've also added a small test which tries to pass structs of different 
sizes, but I will also work on a more complex test which will 
stress-test all invocation modes in a more complete fashion. With 
respect to testing, I've also done a fastdebug build and ran all tests 
with that (as fastdebug catches way many more hotspot assertion than the 
product version); everything passed.



I'd like to thank Vladimir Ivanov for the prompt support whenever I got 
stuck down the macro assembler rabbit hole :-)


[1] - 

More information about the panama-dev mailing list