[foreign-jextract] RFR: 8261642: improve jextract source generation mode and remove class generation

Maurizio Cimadamore mcimadamore at openjdk.java.net
Fri Feb 12 22:33:10 UTC 2021

This patch replaces the existing code generators used by jextract by a single, unified, source generation strategy.

As discussed in [1], there are some trade-offs between source generation and classfile generation; for instance, when using classfile translation, we can take advantage of certain features of the JVM bytecode, such as dynamic constants, which makes it easy to generate big files full of constants which are only initialized on a *by-need* basis.

Classfile generation is not without issue though; first, it is almost always easier to deal with sourcefiles rather than classes - both in terms of building environment and also in terms of sharing (e.g. uploading generated bindings on a GitHub project). This is for instance why we have always used source-based bindings for libclang to power jextract. We feel that users will probably be subject to similar limitations.

Secondly, IDEs can sometimes be picky about what's in the classfile - and I have seen cases of IntelliJ failing to decompile classes containing dynamic constants.

Last, but not least, classfiles cannot contain information which would be helpful for the user, such as javadoc. Adding javadoc comments, containing at the very least the original signatures of the C functions/structs would be very helpful when developing applications which use jextract generated code.

For these reasons, I have decided to explore a source-based strategy for jextract. Note that this does not alter the set of options available in the command line, or the default behavior of jextract: that is, jextract will still emit classfiles if you do not specify --sources; but instead of having _two_ different backends (as we do now), this patch replaces both backends with a single source-based backend. We can discuss, as followup changes, as to whether we should just make source generation the default and drop classfile generation, but that's for another day.

The basic idea behind the approach is to stash constants in a separate class; that is, when we need a native method handle, we can generate a class which contains all the constants relative to that native method handle. This allows us to only load the constants that are used for a given native invocation.

One obvious problem of this approach is the sheer number of classes that it generates (although only a subset of them are truly loaded and initialized). To counteract that, this patch adopts several strategies:

1. first, it reuses a container class, if that's already available, for storing the constants. For instance, if we are generating a struct class, we can just emit the constants inside the struct class, since the constants are private to that struct anyway (we cannot do that trick for functional interfaces, unfortunately, as these are interfaces, and all fields in interfaces must be public - so there's no way to *hide* the constant fields in there - we could do that if Java supported `private static` fields in interfaces, which is probably not so unreasonable to consider).

2. The new code avoids some duplication which was present in the old approach when it comes to `#define` constants; these had getters both in the header Java API and in the internal constant source files, therefore almost doubling the size of code required to support a single constant. This is now gone in the new scheme; since constants are pretty uniquitous in header files, this move saves quite a bit of footprint.

3. To further reduce the number of constant holder classes, I added a very simple grouping logic, so that the same holder class can be reused across N different API points (where N can be tweaked using a JDK property). In other words, if you turn up N to a very high value (e.g. 1000) you end up with the same generation scheme we had before, where all constants end up in the same class (or very few different classes). The default is set to 5, which, in my experiments has proven to be a good balance between fast startup (negligible difference when compared to the dynamic constant classfile approach) and artifact footprint (what comes out of jextract has basically the same size as before).

In terms of implementation, the main idea was to double down on the XYZBuilder interface and, at the same time, simplify it. We now have the following abstractions:

* JavaSourceBuilder: basic abstraction which deals with creating java sources, maintains a backing string buffer, etc. It has a main API, which is used by `OutputFactory` to add variables, functions, constants, structs, functional interfaces, typedefs. This probably represents the biggest architectural shift in the code: previously, OutputFactory was responsible for low-level operation, such as starting and ending source generation - now all these details are encapsulated in the builders and OutputFactory deals with a much simplified API (with the result that OutputFactory became a lot simpler in places, such as `visitVariable` and `visitScoped`).

* HeaderFileBuilder: as before, this is the builder that is responsible for holding the contents of an header file. It can be used to add vars (globals), functions, structs, typedefs and functional interfaces. 

* NestedClassBuilder: as before, this is a builder for all things that are nested inside the main header file.

* ConstantBuilder: this is a new class which partly plays the role played by the former `ConstantHelper` - it is now a type of `NestedClassBuilder` which has logic to emit constants. Note that JavaSourceBuilder has a method, namely `emitWithConstantClass` which spins a new constant holder on the fly, and allows some code generation to depends on the constants being added to that holder. ConstantBuilder also has a main API for adding method/var handles, segment/address constants, etc. Each of these methods return a `Constant` instance, which has few methods that can be useful when generating code (e.g. to generate the access expression needed to get the constant, or to emit a getter for the constant inside another builder).

* StructBuilder, FunctionalInterfaceBuilder, TypedefBuilder; as before - the only notable difference being that now StructBuilder extends ConstantBuilder, so it can be the holder of its own constants.

I think this covers the main things that have been touched by this patch - feel free to ask questions, in case I missed something!


[1] - https://mail.openjdk.java.net/pipermail/panama-dev/2021-January/011887.html


Commit messages:
 - Further consolidate constant generation code
 - Consolidate generation of constant code
 - Merge branch 'foreign-jextract' into jextract-clean-source-gen
 - Add JDK parameter to configure number of constants per synthetic class
 - Encapsulate logic for handling nested anon structs
 - Fix (pre-existing) issue with iteration order of prefixes in StructBuilder
 - Add try finally before popping struct names
 - Fix issues with cross-split typedefs
 - Fully encapsulate header splitting
 - Preserve public API by not using functional interfaces as constant holders (this "leaks" constant fields outside).
 - ... and 15 more: https://git.openjdk.java.net/panama-foreign/compare/bbf0f35c...a902cfdc

Changes: https://git.openjdk.java.net/panama-foreign/pull/450/files
 Webrev: https://webrevs.openjdk.java.net/?repo=panama-foreign&pr=450&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8261642
  Stats: 2765 lines in 16 files changed: 774 ins; 1578 del; 413 mod
  Patch: https://git.openjdk.java.net/panama-foreign/pull/450.diff
  Fetch: git fetch https://git.openjdk.java.net/panama-foreign pull/450/head:pull/450

PR: https://git.openjdk.java.net/panama-foreign/pull/450

More information about the panama-dev mailing list