[foreign] RFR 8210757: Add binder support for direct native invocation strategy

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Fri Sep 28 11:19:05 UTC 2018

This is an updated version of the direct invocation scheme support. Very 
close to the last one, but there are some minor refactorings/improvements:

1) Added a @Stable annotation in DirectNativeInvoker's MH field
2) box/unbox routine used by the UniversalXYZ strategies have been moved 
from NativeInvoker to UniversalNativeInvoker
3) I revamped the logic which detects whether fastpath is applicable - 
now we create the calling sequence first, and we use that to check 
whether we can fast path it. Some internal benchmark have shown that 
with a large number of symbols, we were doing a lot of work because we 
were trying the fastpath always and then, in case of exception fallback 
to slow path; in such cases we would create calling sequence twice. This 
new technique might also be more friendly w.r.t. Windows and other ABIs.

I'd really like to move ahead with this (as this RFR has been out for 
quite a while now) - if there's no other comments I'll go ahead.


On 14/09/18 19:04, Maurizio Cimadamore wrote:
> Hi,
> as mentioned in [1], this patch adds binder support for the so called 
> 'direct' invocation scheme, which allows for greater native invocation 
> downcall/upcall performances by means of specialized adapters. The 
> core idea, also described in [1], is to define adapters of the kind:
> invokeNative_V_DDDDD
> invokeNative_V_JDDDD
> invokeNative_V_JJDDD
> invokeNative_V_JJJDD
> invokeNative_V_JJJJD
> invokeNative_V_JJJJJ
> Where long arguments come before double arguments (and do this for 
> each arity e.g. <=5).
> If all arguments are passed in register, then this reordering doesn't 
> affect behavior, and greatly limits the number of permutations to be 
> supported/generated.
> The downcall part (java to native) is relative straightforward: the 
> directNativeInvoker.cpp file defines a bunch of native entry points, 
> one per shape, which cast the input address to a function pointer of 
> the desired shape, and then call it:
> jlong NI_invokeNative_J_JD(JNIEnv *env, jobject _unused, jlong addr, 
> jlong arg0, jdouble arg1) {
>     return ((jlong (*)(jlong, jdouble))addr)(arg0, arg1);
> }
> The upcall business is a little trickier: first, if we are only to 
> optimize upcalls where argument passing happens in registers, then 
> it's crucial to note that by the time we get into the assembly stub, 
> all the registers will have been populated by the native code to 
> contain the right arguments in the right places. So we can avoid all 
> the shuffling in the assembly adapter and simply jump onto a C 
> function that looks like this:
> long specialized_upcall_helper_J(long l0, long l1, long l2, long l3,
>                                       double d0, double d1, double d2, 
> double d3,
>                                        unsigned int mask, jobject rec) 
> { ... }
> Note here that the first 8 arguments are just longs and doubles, and 
> those will be expected to be in registers, according to the System V 
> ABI. (In windows, the situation will be a bit different as less 
> integer registers are available, so this will need some work there).
> So, to recap, the assembly upcall stub simply 'append' the receiver 
> object and a 'signature mask' in the last two available C registers 
> and then jump onto the helper function. The helper function will find 
> all the desired arguments in the right places - there will be, in the 
> general case, some unused arguments, but that's fine, after all it 
> didn't cost anything to us to load them in the first place!
> Note that we have three helper variants, one for each return type { 
> long, double, void }. This is required as we need the C helper to 
> return a value of the right type which will generate the right 
> assembly sequence to store the result in the right register (either 
> integer or MMX).
> So, with three helpers we can support all the shapes with up to 8 
> arguments. On the Java side we have, of course, to define a 
> specialized entry point for each shape.
> All the magic for adapting method handle to and from the specialized 
> adapters happen in the DirectSignatureShuffler class; this class is 
> responsible for adapting each argument e.g. from Java to native value, 
> and then reordering the adapted method handle to match the order in 
> which arguments are expected by the adapter (e.g. move all longs in 
> front). The challenge was in having DirectSignatureShuffle to be fully 
> symmetric - e.g. I did not want to have different code paths for 
> upcalls and downcalls, so the code tries quite hard to be parametric 
> in the shuffling direction (java->native or native->java) - which 
> means that adapters will be applied in one way or in the inverse way 
> depending on the shuffling direction (and as to whether we are 
> adapting an argument or a return). Since method handle filters are 
> composable, it all works out quite beautifully.
> Note that the resulting, adapted MH is stored in a @Stable field to 
> tell the JIT to optimize the heck out of it (as if it were a static 
> constant).
> This patch contains several other changes - which I discuss briefly 
> below:
> * we need to setup a framework in which new invocation strategies can 
> be plugged in - note that we now have essentially 4 cases:
> { NativeInvoker, UpcallHandler } x { Universal, Direct }
> When the code wants e.g. a NativeInvoker, it asks for one to the 
> NativeInvoker::of factory (UpcallHandler work in a similar way); this 
> factory will attempt to go down the fast path - if an error occurs 
> when computing the fast path, the call will fallback to the universal 
> (slow) path.
> Most of the changes you see in the Java code are associated to this 
> refactoring - e.g. all clients of NativeInvoker/UpcallHandler should 
> now go through the factory
> * CallbackImplGenerator had a major issue since the new factory for 
> NativeInvoker wants to bind an address eagerly (this is required e.g. 
> to be forward compatible with linkToNative backend); which means that 
> at construction time we have to get the address of the callback, call 
> the NativeInvoker factory and then stash the target method handle into 
> a field of the anon callback class. Vlad tells me that fields of anon 
> classes are always 'trusted' by the JIT, which means they should be 
> treated as '@Stable' (note that I can't put a @Stable annotation 
> there, since this code will be spinned in user-land).
> * There are a bunch of properties that can be set to either force slow 
> path or force 'direct' path; in the latter case, if an error occurs 
> when instantiating the direct wrapper, an exception is thrown. This 
> mode is very useful for testing, and I indeed have tried to run all 
> our tests with this flag enabled, to see how many places could not be 
> optimized.
> * I've also reorganized all the native code in hotspot/prims so that 
> we have a separate file for each scheme (and so that native Java 
> methods could be added where they really belong). This should also 
> help in the long run as it should make adding/removing a given scheme 
> easier.
> * I've also added a small test which tries to pass structs of 
> different sizes, but I will also work on a more complex test which 
> will stress-test all invocation modes in a more complete fashion. With 
> respect to testing, I've also done a fastdebug build and ran all tests 
> with that (as fastdebug catches way many more hotspot assertion than 
> the product version); everything passed.
> Webrev:
> http://cr.openjdk.java.net/~mcimadamore/panama/8210757/
> I'd like to thank Vladimir Ivanov for the prompt support whenever I 
> got stuck down the macro assembler rabbit hole :-)
> Cheers
> Maurizio
> [1] - 
> http://mail.openjdk.java.net/pipermail/panama-dev/2018-September/002652.html

More information about the panama-dev mailing list