[foreign-jextract] RFR: 8261642: improve jextract source generation mode and remove class generation [v2]

Jorn Vernee jvernee at openjdk.java.net
Mon Feb 15 13:59:53 UTC 2021

On Mon, 15 Feb 2021 11:39:19 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> This patch replaces the existing code generators used by jextract by a single, unified, source generation strategy.
>> As discussed in [1], there are some trade-offs between source generation and classfile generation; for instance, when using classfile translation, we can take advantage of certain features of the JVM bytecode, such as dynamic constants, which makes it easy to generate big files full of constants which are only initialized on a *by-need* basis.
>> Classfile generation is not without issue though; first, it is almost always easier to deal with sourcefiles rather than classes - both in terms of building environment and also in terms of sharing (e.g. uploading generated bindings on a GitHub project). This is for instance why we have always used source-based bindings for libclang to power jextract. We feel that users will probably be subject to similar limitations.
>> Secondly, IDEs can sometimes be picky about what's in the classfile - and I have seen cases of IntelliJ failing to decompile classes containing dynamic constants.
>> Last, but not least, classfiles cannot contain information which would be helpful for the user, such as javadoc. Adding javadoc comments, containing at the very least the original signatures of the C functions/structs would be very helpful when developing applications which use jextract generated code.
>> For these reasons, I have decided to explore a source-based strategy for jextract. Note that this does not alter the set of options available in the command line, or the default behavior of jextract: that is, jextract will still emit classfiles if you do not specify --sources; but instead of having _two_ different backends (as we do now), this patch replaces both backends with a single source-based backend. We can discuss, as followup changes, as to whether we should just make source generation the default and drop classfile generation, but that's for another day.
>> The basic idea behind the approach is to stash constants in a separate class; that is, when we need a native method handle, we can generate a class which contains all the constants relative to that native method handle. This allows us to only load the constants that are used for a given native invocation.
>> One obvious problem of this approach is the sheer number of classes that it generates (although only a subset of them are truly loaded and initialized). To counteract that, this patch adopts several strategies:
>> 1. first, it reuses a container class, if that's already available, for storing the constants. For instance, if we are generating a struct class, we can just emit the constants inside the struct class, since the constants are private to that struct anyway (we cannot do that trick for functional interfaces, unfortunately, as these are interfaces, and all fields in interfaces must be public - so there's no way to *hide* the constant fields in there - we could do that if Java supported `private static` fields in interfaces, which is probably not so unreasonable to consider).
>> 2. The new code avoids some duplication which was present in the old approach when it comes to `#define` constants; these had getters both in the header Java API and in the internal constant source files, therefore almost doubling the size of code required to support a single constant. This is now gone in the new scheme; since constants are pretty uniquitous in header files, this move saves quite a bit of footprint.
>> 3. To further reduce the number of constant holder classes, I added a very simple grouping logic, so that the same holder class can be reused across N different API points (where N can be tweaked using a JDK property). In other words, if you turn up N to a very high value (e.g. 1000) you end up with the same generation scheme we had before, where all constants end up in the same class (or very few different classes). The default is set to 5, which, in my experiments has proven to be a good balance between fast startup (negligible difference when compared to the dynamic constant classfile approach) and artifact footprint (what comes out of jextract has basically the same size as before).
>> In terms of implementation, the main idea was to double down on the XYZBuilder interface and, at the same time, simplify it. We now have the following abstractions:
>> * JavaSourceBuilder: basic abstraction which deals with creating java sources, maintains a backing string buffer, etc. It has a main API, which is used by `OutputFactory` to add variables, functions, constants, structs, functional interfaces, typedefs. This probably represents the biggest architectural shift in the code: previously, OutputFactory was responsible for low-level operation, such as starting and ending source generation - now all these details are encapsulated in the builders and OutputFactory deals with a much simplified API (with the result that OutputFactory became a lot simpler in places, such as `visitVariable` and `visitScoped`). Note that, since this now includes the former StringSourceBuilder featuring the string buffer, a lot of indirections of the like `builder.xyz...` have been replaced with just `xyz` which makes the code easier to follow.
>> * HeaderFileBuilder: as before, this is the builder that is responsible for holding the contents of an header file. It can be used to add vars (globals), functions, structs, typedefs and functional interfaces. 
>> * NestedClassBuilder: as before, this is a builder for all things that are nested inside the main header file.
>> * ConstantBuilder: this is a new class which partly plays the role played by the former `ConstantHelper` - it is now a type of `NestedClassBuilder` which has logic to emit constants. Note that JavaSourceBuilder has a method, namely `emitWithConstantClass` which spins a new constant holder on the fly, and allows some code generation to depends on the constants being added to that holder. ConstantBuilder also has a main API for adding method/var handles, segment/address constants, etc. Each of these methods return a `Constant` instance, which has few methods that can be useful when generating code (e.g. to generate the access expression needed to get the constant, or to emit a getter for the constant inside another builder).
>> * StructBuilder, FunctionalInterfaceBuilder, TypedefBuilder; as before - the only notable difference being that now StructBuilder extends ConstantBuilder, so it can be the holder of its own constants.
>> I think this covers the main things that have been touched by this patch - feel free to ask questions, in case I missed something!
>> Cheers
>> [1] - https://mail.openjdk.java.net/pipermail/panama-dev/2021-January/011887.html
> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:
>   Remove template for @C annotation
>   Remove debugging/parameter info from jextract compilation

LGTM, some minor comments inline

src/jdk.incubator.jextract/share/classes/jdk/internal/jextract/impl/ConstantBuilder.java line 178:

> 176:         Constant constant = namesGenerated.get(lookupName);
> 177:         if (constant == null) {
> 178:             constant = constantFactory.get();

Maybe a check or assert could be put here to check if the created constant has the expected `kind`? (Just to be sure)

src/jdk.incubator.jextract/share/classes/jdk/internal/jextract/impl/ConstantBuilder.java line 267:

> 265:             append(")");
> 266:         } else if (l instanceof GroupLayout) {
> 267:             if (((GroupLayout) l).isStruct()) {

Good place to use `instanceof` pattern matching.

src/jdk.incubator.jextract/share/classes/jdk/internal/jextract/impl/ConstantBuilder.java line 318:

> 316:                 append(delim);
> 317:                 indent();
> 318:                 emitLayoutString(e);

Looks like this is emitting all layouts inline? I guess there's not much that can be easily done about that in the current code (or maybe you have an idea). Maybe in the future a scheme where the return/params are references to other constants could reduce overhead further. (e.g. it seems common that if a func desc references a struct layout, then we also have a struct class that already declares that layout).

src/jdk.incubator.jextract/share/classes/jdk/internal/jextract/impl/JavaSourceBuilder.java line 270:

> 268:             }
> 269:             constant_counter = 0;
> 270:             constantBuilder = new ConstantBuilder(this, Kind.CLASS, javaName + "_constants");

Using the java name of some constant to uniqueify the name of the constant class seems a bit strange to me... Couldn't there be collisions? (e.g. if struct and unction are named the same?) Why not just use to id/number here like before?

src/jdk.incubator.jextract/share/classes/jdk/internal/jextract/impl/StructBuilder.java line 57:

> 55:         this.structType = structType;
> 56:         prefixElementNames = new ArrayDeque<>();
> 57:         classBegin();

Kinda strange that the constructor here calls `classBegin()` but OutputFactory calls `classEnd()`. Could both calls be moved to `OutputFactory`?


Marked as reviewed by jvernee (Committer).

PR: https://git.openjdk.java.net/panama-foreign/pull/450

More information about the panama-dev mailing list