[foreign-memaccess+abi] RFR: 8262118: Specialize upcalls [v3]
vlivanov at openjdk.java.net
Fri Feb 26 16:42:58 UTC 2021
On Tue, 23 Feb 2021 15:54:16 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:
>> This patch adds specialization of upcalls, which includes both the specialization of the binding recipe using MethodHandle combinators, as for downcalls, as well as VM support for generating a customized wrapper for a 'low-level' method handle taking and returning only primitive types, which can be used as a base handle in the binding specialization.
>> The latter is based on Vladimir Ivanov's earlier linkToNative work, with a few changes to the frame layout, exception handling, and added support for attaching non-java threads, as well as re-writing it to support an arbitrary caller ABI.
>> The current implementation supports x86 only for now - though I've made sure all the tests still pass on AArch64. Stack argument passing is disabled, as for downcalls - there is a mismatch between the encoding for stack slots we use on the Java side, and the encoding we use in the VM, which needs some more consideration. And finally, as with downcalls, multi-register returns are disabled as well. We need to figure out what the right protocol for those should be, maybe taking inspiration from Valhalla.
>> In the code I've moved some things from ProgrammableInvoker to SharedUtils in order to reuse for upcalls. I also added a bit of debugging code to BufferLayout for dumping stack arguments. Note that there is quite a bit of motion in ProgrammableUpcallHandler as well. This is mostly to untangle the part of an upcall that is specialized using MethodHandle combinators (invokeInterpBindings), from the part that is replaced by the VM's wrapper stub (invokeMoves). I've for instance introduced a simple helper class, called InterpretedHandler, which acts as a wrapper around the target MethodHandle for doing the old-style interpreted calls, instead of using the whole ProgrammableUpcallHandler instance.
>> I've also removed some of the test configurations. We now only test the default settings of various specialization and intrinsification flags when running TestDowncall, TestUpcall, and TestUpcallHighArity. This patch 2 more of these flags, making a total of 16 combinations, which would take way too long to test, and is probably not necessary. I gave the full testing matrix a run, and found no issues. We can repeat that process periodically to check that all the combinations still work. I've also added some more benchmark cases for the upcalls benchmark.
>> I also gave the Windows jextract samples a try with this patch and found no issues.
>> Here are some of the benchmark numbers (from Linux-x64):
>> Benchmark Mode Cnt Score Error Units
>> Upcalls.jni_args10 avgt 30 134.220 ? 1.553 ns/op
>> Upcalls.jni_args5 avgt 30 87.340 ? 1.753 ns/op
>> Upcalls.jni_blank avgt 30 57.590 ? 0.877 ns/op
>> Upcalls.jni_identity avgt 30 109.859 ? 0.930 ns/op
>> Upcalls.panama_args10 avgt 30 30.606 ? 0.088 ns/op
>> Upcalls.panama_args5 avgt 30 23.908 ? 0.121 ns/op
>> Upcalls.panama_blank avgt 30 20.314 ? 0.137 ns/op
>> Upcalls.panama_identity avgt 30 24.894 ? 0.111 ns/op
>> While these are the numbers with the optimizations disabled (quite a bit worse):
>> Upcalls.panama_args10 avgt 30 886.885 ? 22.133 ns/op
>> Upcalls.panama_args5 avgt 30 566.600 ? 12.442 ns/op
>> Upcalls.panama_blank avgt 30 212.962 ? 2.386 ns/op
>> Upcalls.panama_identity avgt 30 361.358 ? 3.212 ns/op
>> We beat JNI across the board, and also seem to scale a lot better for larger numbers of arguments. So, this seems like a success :)
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> Address review comments
Marked as reviewed by vlivanov (Committer).
More information about the panama-dev