[foreign-memaccess+abi] RFR: 8262118: Specialize upcalls [v2]

Maurizio Cimadamore mcimadamore at openjdk.java.net
Tue Feb 23 11:17:54 UTC 2021

On Mon, 22 Feb 2021 14:30:27 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> Hi,
>> This patch adds specialization of upcalls, which includes both the specialization of the binding recipe using MethodHandle combinators, as for downcalls, as well as VM support for generating a customized wrapper for a 'low-level' method handle taking and returning only primitive types, which can be used as a base handle in the binding specialization.
>> The latter is based on Vladimir Ivanov's earlier linkToNative work, with a few changes to the frame layout, exception handling, and added support for attaching non-java threads, as well as re-writing it to support an arbitrary caller ABI.
>> The current implementation supports x86 only for now - though I've made sure all the tests still pass on AArch64. Stack argument passing is disabled, as for downcalls - there is a mismatch between the encoding for stack slots we use on the Java side, and the encoding we use in the VM, which needs some more consideration. And finally, as with downcalls, multi-register returns are disabled as well. We need to figure out what the right protocol for those should be, maybe taking inspiration from Valhalla.
>> In the code I've moved some things from ProgrammableInvoker to SharedUtils in order to reuse for upcalls. I also added a bit of debugging code to BufferLayout for dumping stack arguments. Note that there is quite a bit of motion in ProgrammableUpcallHandler as well. This is mostly to untangle the part of an upcall that is specialized using MethodHandle combinators (invokeInterpBindings), from the part that is replaced by the VM's wrapper stub (invokeMoves). I've for instance introduced a simple helper class, called InterpretedHandler, which acts as a wrapper around the target MethodHandle for doing the old-style interpreted calls, instead of using the whole ProgrammableUpcallHandler instance.
>> I've also removed some of the test configurations. We now only test the default settings of various specialization and intrinsification flags when running TestDowncall, TestUpcall, and TestUpcallHighArity. This patch 2 more of these flags, making a total of 16 combinations, which would take way too long to test, and is probably not necessary. I gave the full testing matrix a run, and found no issues. We can repeat that process periodically to check that all the combinations still work. I've also added some more benchmark cases for the upcalls benchmark.
>> I also gave the Windows jextract samples a try with this patch and found no issues.
>> Here are some of the benchmark numbers (from Linux-x64):
>> Benchmark                Mode  Cnt    Score   Error  Units
>> Upcalls.jni_args10       avgt   30  134.220 ? 1.553  ns/op
>> Upcalls.jni_args5        avgt   30   87.340 ? 1.753  ns/op
>> Upcalls.jni_blank        avgt   30   57.590 ? 0.877  ns/op
>> Upcalls.jni_identity     avgt   30  109.859 ? 0.930  ns/op
>> Upcalls.panama_args10    avgt   30   30.606 ? 0.088  ns/op
>> Upcalls.panama_args5     avgt   30   23.908 ? 0.121  ns/op
>> Upcalls.panama_blank     avgt   30   20.314 ? 0.137  ns/op
>> Upcalls.panama_identity  avgt   30   24.894 ? 0.111  ns/op
>> While these are  the numbers with the optimizations disabled (quite a bit worse):
>> Upcalls.panama_args10    avgt   30  886.885 ? 22.133  ns/op
>> Upcalls.panama_args5     avgt   30  566.600 ? 12.442  ns/op
>> Upcalls.panama_blank     avgt   30  212.962 ?  2.386  ns/op
>> Upcalls.panama_identity  avgt   30  361.358 ?  3.212  ns/op
>> We beat JNI across the board, and also seem to scale a lot better for larger numbers of arguments. So, this seems like a success :)
>> Thanks,
>> Jorn
> Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision:
>  - Tab to spaces
>  - Added TestMatrix test that test all configurations of TestUpcall TestDowncall and TestUpcallHightArity

Looks a very solid piece of work (I've mostly checked the Java side). I like how the code has been consolidated between upcall and downcall paths. I've added some minor comments around for your consideration.

src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/abi/ProgrammableInvoker.java line 243:

> 241: 
> 242:         if (bufferCopySize > 0) {
> 243:             specializedHandle = SharedUtils.wrapWithAllocator(specializedHandle, argAllocatorPos, bufferCopySize, false);

The variable leafType in this method is unused (I couldn't leave the comment there - for some reason GitHub doesn't allow me to add comments outside the highlighted areas...)

src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/abi/ProgrammableInvoker.java line 314:

> 312:                                 Map<VMStorage, Integer> argIndexMap,
> 313:                                 Map<VMStorage, Integer> retIndexMap) throws Throwable {
> 314:         try (Allocator unboxAllocator = SharedUtils.makeAllocator(bufferCopySize)) {

I did a similar simplification in the resourceScope branch

src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/abi/aarch64/CallArranger.java line 325:

> 323:                             VMStorage storage = regs[i];
> 324:                             final long size = group.memberLayouts().get(i).byteSize();
> 325:                             Class<?> type = SharedUtils.primitiveCarrierForSize(size, false);

Not sure this is ok - this is a struct of floats passed by register?

src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/abi/aarch64/CallArranger.java line 424:

> 422:                             VMStorage storage = regs[i];
> 423:                             final long size = group.memberLayouts().get(i).byteSize();
> 424:                             Class<?> type = SharedUtils.primitiveCarrierForSize(size, false);

Same here - aren't HFA passed by (vector) registers?

src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/abi/x64/sysv/CallArranger.java line 274:

> 272:                             bindings.dup();
> 273:                         }
> 274:                         boolean useFloat = storage.type() == StorageClasses.VECTOR;

So , my understanding here is that we had issues all along - e.g. this is unrelated to the upcall work?

test/jdk/TEST.groups line 341:

> 339: jdk_foreign = \
> 340:     java/foreign \
> 341:     -java/foreign/TestMatrix.java

Thanks for setting this up!

src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/abi/ProgrammableUpcallHandler.java line 134:

> 132:             MethodHandle doBindingsErased = doBindings.asSpreader(Object[].class, doBindings.type().parameterCount());
> 133:             MethodHandle invokeMoves = insertArguments(MH_invokeMoves, 1, doBindingsErased, argMoves, retMoves, abi, layout);
> 134:             InterpretedHandler handler = new InterpretedHandler(invokeMoves);

In principle, this could be just replaced by the method handle no? Then we could have a static entry point in this class which can be used by the VM to "call" the method handle with given args. E.g. do we need the `InterpreterHandler` wrapper?


PR: https://git.openjdk.java.net/panama-foreign/pull/457

More information about the panama-dev mailing list