[foreign-memaccess+abi] RFR: Add a benchmark for strlen using Foreign Linker API

Maurizio Cimadamore mcimadamore at openjdk.java.net
Wed Feb 17 15:32:14 UTC 2021

On Wed, 17 Feb 2021 15:17:31 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> I've been spending some time looking into this issue:
> https://bugs.openjdk.java.net/browse/JDK-8261828
> And, to understand better the problem, I put together an hopefully comprehensive benchmark of the strlen function; it turns out that the strlen call itself is fast, and it's the conversion from Java to native string where the benchmark spends most of its time.
> While playing with the benchmark, I came up with alternative ways to do this conversion which greatly speed up the benchmark results, even surpassing (at least on my machine) what's possible with JNI. Jorn and I think that, for future references, it would be a good idea to include this benchmark in our suite.
> For the curious, there are many factors which make the default `CLinker::toCString` go slower than expected: 
> * allocating a fresh segment on each iteration is expensive, because it takes two native call (malloc, memset), plus a bunch of CAS to reserve memory in the Java runtime
> * freeing the segment on each iteration is equally expensive - one native call (free), plus again, some CAS to unreserve memory 
> * bulk copy is fast, but again requires a native call
> * all in all, we need 4 native calls per iteration (malloc, memset, copy, free) each adding cost when it comes to state transitions
> In other words, the advantage of JNI here is that (i) the level of safety provided by JNI is lower (e.g. the runtime doesn't need to track e.g. allocated memory, which segments do); also (ii) when we call the JNI-ified strlen function, the malloc, free, copy happen when we're in native code already - which means less state transitions are required.
> Note that we can completely eliminate (i) basically by creating restricted segments using CLinker::allocateMemoryRestricted (which does a plain malloc). We can also eliminate (ii) by creating *trivial* function descriptors for the calls to malloc/free/strlen, thereby removing cost associated with state transitions there. Both routes are tested in the benchmark (note that they both requires some willingness to embrace restricted methods). I have put together a variant which shows how NativeScope can be used to speed allocation up (which works really well for small strings, and is _not_ restricted).
> What are the lessons learned for plain `CLinker::toCString` ? 
> * While the logic is generally fast, all state transitions and unsafe calls are killing performance in such a tight scenario; perhaps worth considering intrinsifying Unsafe::allocateMemory/copyMemory/setMemory/freeMemory.
> * The way to go, performance-wise is not to rely on the default malloc-based allocation. This is where Panama has a big edge over JNI, whose allocation logic is _fixed_. Proposals such as the one described in [1] will make passing custom allocators to `CLinker::toCString` easier, so that clients can decide which allocation strategy best fits their use case.
> [1] - https://inside.java/2021/01/25/memory-access-pulling-all-the-threads/

For the records, benchmark results on my machine look as below:

Benchmark                                (size)  Mode  Cnt    Score   Error  Units
StrLenTest.jni_strlen                         5  avgt   30   47.952 ? 0.899  ns/op
StrLenTest.jni_strlen                        20  avgt   30   61.918 ? 2.668  ns/op
StrLenTest.jni_strlen                       100  avgt   30  135.449 ? 1.454  ns/op
StrLenTest.panama_strlen                      5  avgt   30  115.883 ? 2.194  ns/op
StrLenTest.panama_strlen                     20  avgt   30  114.238 ? 1.896  ns/op
StrLenTest.panama_strlen                    100  avgt   30  133.208 ? 2.056  ns/op
StrLenTest.panama_strlen_scope                5  avgt   30   34.467 ? 0.596  ns/op
StrLenTest.panama_strlen_scope               20  avgt   30   50.872 ? 2.357  ns/op
StrLenTest.panama_strlen_scope              100  avgt   29   89.604 ? 3.675  ns/op
StrLenTest.panama_strlen_unsafe               5  avgt   30   52.222 ? 3.626  ns/op
StrLenTest.panama_strlen_unsafe              20  avgt   30   55.937 ? 2.125  ns/op
StrLenTest.panama_strlen_unsafe             100  avgt   30   67.084 ? 0.671  ns/op
StrLenTest.panama_strlen_unsafe_trivial       5  avgt   30   29.762 ? 0.443  ns/op
StrLenTest.panama_strlen_unsafe_trivial      20  avgt   30   36.156 ? 0.369  ns/op
StrLenTest.panama_strlen_unsafe_trivial     100  avgt   30   59.830 ? 4.222  ns/op


PR: https://git.openjdk.java.net/panama-foreign/pull/454

More information about the panama-dev mailing list