a new job for <init>: "static init factories"

John Rose john.r.rose at oracle.com
Wed Apr 24 18:00:55 UTC 2019

We need a VM-level API point for running a constructor
of an inline (by-value) type, which is distinct from the
current way of running constructors.  Currently, constructors
are invoked by running invokespecial on a special internal
method named <init>, passing a blank fresh instance of
the required class.  The blankness of the fresh instance
is tracked by the verifier using very special rules which
are triggered by the mention of the very special name <init>.

We can't use this mechanism for constructing value types
because there is no way to perform side effects on a
value type instance, blank or not.

The most natural way to create a value type instance,
in the JVM, is to run invokestatic on an appropriately
named static factory method.  The name of this method
is a convention which is known to various parties, notably
the static compiler (javac) and reflection (jlr.Constructor).

After some prototyping, I can say that it is a reasonable
and simple thing to do to re-use the string "<init>" for
the target of an invokestatic which translates a constructor
of an inline type.  For example:

inline class Point { int x, y; Point(int x1, int y1) { x=x1; y=y1; } }
// static Point.<init>(II)Point { vdefault[Point] … }
class Client { static Point myPoint() { return new Point(3,4); } }
// Client.myPoint() { … invokestatic[Point.<init>(II)Point] … }

To do this, we will need to make some changes to the JVM
specification (and implementations).  Here are the changes
I propose:

* Relax constraints on CONSTANT_Methodref and CONSTANT_NameAndType
  allowing free use of <init> as if it were a regular method name.
* Retain all restrictions on use of <init> via invokespecial.
* Allow an invokestatic to mention <init> (but no other bytecodes).
* Retain all restrictions on definition of <init> methods *in regular non-inline classfiles*
* Allow an inline classfile to define a <init> method, only with ACC_STATIC
* Require that the type returned by such an <init> method is the containing class.
  (Extra rider:  If the class is non-denotable, aka. hidden, returned class must be Object.)

These changes ensure that there are only two kinds of methods
named <init>, classic by-reference object constructors, and
new by-value "static init factories".  The specification ensures
that these two kinds of methods, both named <init>, can never
be confused.  The basic mechanism for ensuring this is that
one kind of method is defined as non-static and the other is
defined as static, and there is no way to accidentally invoke
a static method via invokespecial, and vice versa for invokestatic.

Note that CONSTANT_Methodrefs can refer to either kind of
descriptor.  This is not ambiguous, since every use of such a
constant is coupled with an indicator (an opcode or ref-kind)
which tells if the use is of a static method or not.

Note also that there can be "crazy" references to the name
<init> such as under the type "()I" (no args returning int).
These references are harmless because they will never link
to a definition of that signature.  This is true not because there
are limitations on the form of uses of <init>, but because there
are strong limitations on the possible definitions of <init>.
A "crazy" use has no impact, other than causing an eventual
linkage error.  We could try to add more limits against crazy
uses of <init>, but that does not seem to be necessary, except
perhaps as a "defense in depth" move.

Additional changes are needed for reflection:

* Allow new static init factories to be wrapped in `jlr.Constructor`s.
* Ensure that `Constructor::newInstance` uses the right calling sequence
  for static init factories.
* Ensure that `Lookup::findConstructor` can find static init factories.
* Allow `Lookup::findStatic` to find static init factories.
  (This step simplifies mapping between invokestatic and method
  handle constants.  It could be dropped.)
* Ensure that resolution of `CONSTANT_MethodHandle` continues to
  use the reference-kind option correctly.  (No actual spec change.)

The low-level user model has a twist in it:  At the level of bytecodes,
a static init factory is just a vanilla static method, albeit with a funny
name.  But at the level of reflection, a static init factory appears to be
a constructor.  This is reasonable since a static init factory translates
an actual constructor in source code.  A `Constructor` which wraps
a static init factory will have its `ACC_STATIC` method set, allowing
users who care to distinguish the two uses of the name <init>.

The query method `Class::getDeclaredMethods` will not expose
static init factories.  Only `Class::getDeclaredConstructors` will.
A similar rule applies to related API points such as `getMethod`.
The purpose of this restriction (which is optional) is to avoid
having overlaps in the reflected lists of constructors and methods.

Because the `java.lang.invoke` API is lower-level than `java.lang.reflect`,
it should (probably) be willing to treat static init factories as vanilla
static methods.  Thus `findStatic` can return a handle to such a factory.
Because `java.lang.invoke` also integrates (via "unreflection") with
`java.lang.reflect`, `findConstructor` should be willing to treat
static init factories as constructors.  Perhaps one of these methods
can be suppressed, but it seems reasonable to allow both in
`java.lang.invoke`, because of the two-sided positioning of that
API layer.

I posted a HotSpot POC implementation here:

Comments, please?

— John

More information about the valhalla-spec-observers mailing list