Reified generics - shadow class edition

Remi Forax forax at
Tue Sep 18 10:32:16 UTC 2018

Reified generics - shadow class edition.

I believe that try to make method descriptor variant is a bad idea, it comes from the model 1...3 experimentation but it's an artifact of such implementations, not a concept.
Here i describe a way to keep generics erased even if they are reified.

If the descriptor is erased, we need a way to get reified type argument at runtime so you can use 'checkcast' to verify that the arguments that are parameterized have the right type.
By example

  class Holder<any E> {
    E element;

    <when E!= void>  // can be specialized to throw a NoSuchMethodError if E is void
    E get() {
      return element;

    <when E!= void>
    void set(E element) {
      this.element = element;
will be translated into
  class Holder<any E> {
    Object element;

    Object get() {
      return element;
    void set(Object element) {
      element checkcast Es
      // verify that element is the type argument of E here
      this.element = element;

Now to bridge the gap, we also need:
- a way to explain to the VM at runtime that the field 'element' is specialized (if it's a value type)
- a way explain to the VM at runtime that the methods get and set has different implementations 

For that i proposed a new mechanism in the VM called master class/shadow class which is a way define a specialized class, the shadow class, from a template class, the master class. In my example, Holder is the master class and Holder<Complex> with Complex a value type is a shadow class 'derived' at runtime from the master class.

This mechanism is more general than just supporting type specialization in the VM because
- we do not want to inject the Java generic semantics in the VM or the Scala semantics, or the Kotlin semantics, etc.
- we can support more use cases, so other languages can by example associate a constant an int to a class like in C++. 

So the idea is introduce two things that works together:
1) implement in the VM a mechanism that allows to add constant objects as supplementary values (class data) when defining a class
2) use a bootstrap method (to "go meta" as John said) to allow to specialize fields and methods of such class

Those two features may be cleanly separated in the future, but i'm not sure how to do that, so for now, let say they are two parts of the same feature, the master class/shadow class feature.

For (1), we need a class file attribute that describe the class of each class data, we don't need to name them, it can be positional (for java generics we may introduce another attribute or re-use one existing to find the name of the class data if they are type parameter).
For (2), we need to specify a boostrap method that will be called to describe how the specialization should be done.

Considering (1) and (2) as a unique feature means you can have the same class attribute definining the class data and the boostrap method.
  The MasterClass attribute

    u2 attribute_name_index;
    u4 attribute_length;
    u2 number_of_class_data;
      u2 descriptor
      u2 default_value  
    } class_data[number_of_class_data] 
    u2 bootstrap_method_attr_index;
    u2 name_and_type_index;

 The class data descriptor is a field descriptor that describes the class of the class_data, it should be a class among int, long, float, double, String, MethodType, MethodHandle, i.e. the type of the constant that can appear in the constant pool.
 The default value is a constant pool item that defines the value that will be used if the shadow class is created with no class_data.   
 The bootstrap method is called to derive a shadow class from a master class if the shadow class has not yet been created yet. The bootstrap method takes a Lookup configured on the master class, a name, a Class (the type of the name_and_type) and an array of Object containing the class data as parameter (+ some eventual boostrap arguments) and returns a reference to the java.lang.invoke.Classy.
 The type of the name_and_type as to be a subtype of java.lang.invoke.Classy.

The interface java.lang.invoke.Classy describes how to specialize a shadow class from a master class.
  interface Classy {
    Class<?> superclass();
    Class<?>[] interfaces();
    String fieldDescriptor(String field, String descriptor);
    MethodHandle method(String name, String descriptor);

superclass() returns the super-class on the shadow class, it has to be a specialization of master class super-class (a subtype of the master class super-class) or the master class super-class it self.
interfaces() return the interfaces of the shadow class, each interface has to be a subtype of the master class corresponding interface or the corresponding interface itself.
fieldDescriptor() is called for each field of the master class, with the field name and the field descriptor the master class, this method returns the field descriptor of corresponding field of the shadow class, it must be a subtype of the master class field. If null is returns, it means the field doesn't exist and a NoSuchFieldError will be thrown upon access.
method() is called for each method of the master class, with the method name and its method descriptor, this method returns a method handle corresponding to the specialization of the master class method in the shadow class. The method handle type as to be exactly the same as the descriptor sent as parameter. If null is returns, it means the method doesn't exist and a NoSuchMethodError will be thrown upon access.

The idea here is that a shadow class is a covariant variant of the master class, a field can be replaced by a subtype, a method can be replaced by a specialized variant with the same parameter types. This allow any shadow call to be accessed using any opcode that takes the master class as owner, getfield, putfield, all invoke* opcodes. For getfield, a value-type can be buffered by the VM to Object/an interface. For putfield, the VM as to perform an extra check at runtime (like there is an extra check for arraystore because arrays are covariant).

The interface Classy can be used by the VM at anypoint in time, so calls to method can be lazy or not (the other informations are needed to determine the layout of a class so they can not be called lazily).

At runtime, for the VM, an instance of a shadow class is a subtype of a master class.

The fact that the shadow class is a subtype of the master class allows to desugar wildcards in Java as the master class.
A shadow class has no special encoding in the bytecode, it only has a representation in the runtime data structure of the VM.

In order to be be backward compatible, java.lang.Class is extended to also represents shadow classes, java.lang.Class is extended by the following methods:
- Class<?> withClassData(Object... data) that returns the shadow class of a master class.
- Object[] getClassData() that returns the class data of a shadow class or null.
- boolean isMasterClass() return if current class is a master class.
- Class<?> getMasterclass() that returns the master class of a shadow class or the current class otherwise (a classical class is it's own master class).  

Reusing java.lang.Class to represent shadow classes at runtime is important because it allows reflection and java.lang.invoke to works seamlessly with the shadow class because from a user point of view, a classical class and a shadow class are all java.lang.Class.

There is a compatibility issue with Object.getClass(), isInstance, instanceof and checkcast, they can not can not returns/uses the shadow class because a code like this o.getClass() == ArrayList.class  or o instanceof ArrayList will not work if the comparison uses the shadow class. This means that getClass(), instanceof and checkcast need to check the master class of the shadow class instead of using the shadow class directly.
Note that this problem is not inherent to the shadow class, it's an artifact of the fact that the type argument is reified.

This means that we have to introduce a least a supplementary methods for getClass(), a static method in class, Class.getTheTrueRealClass(Object o) is enough, it also means that if we want to allow reified cast/instanceof in Java/.class notation, this will have to be implemented using invokedynamic/condy (again to avoid to avoid to bolt the Java generics semantics in the VM). We may also choose to not support reified cast/instanceof in Java, given that being able to specialized fields/methods is more important in term of performance and that we will not support reified generics of objects anyway.  

The fact that a shadow class has a representation in the classfile means that we are loosing information because if ArrayList is anyfied,
  ArrayList<String> list = ...
list.get() is encoded in the bytecode as a calls to the master class ArrayList and not a class to the shadow class ArrayList, so a call to an anyfied generics is still erased, but given that this information is available at runtime (the inlining cache stores the shadow class), a JIT can easily inline the call.

With the classfile only containing classical descriptor, in term of opcodes we need only to add to support few operations
- new on an anyfied class
- new on an anyfied array
- invocation of an anyfield method.
for all theses operations, the idea is to send the class data (method data) by storing them on the stack and have a bytecode that describe them as class data/method data.
We also need to way to get the method data inside the method on stack.

I propose to introduce two new opcodes, dataload and datastore,
- dataload is constructed with a concatenation of field descriptors as parameter (or a method descriptor with no parens and return type) and takes all values on stack and store them in a side channel.
- datastore also takes a concatenation of field descriptors as parameter and extract the data from the side channel to the stack.

dataload is used as prefix of anew, anewarray to pass the class data that will be used to build the shadow class (if not already created)
dataload is used as prefix of all invoke* bytecode to pass the method data.

We also need a special reflection method in Thread, getMethodData() that returns the method data associated to the current method as an array or null if no method data was pass when the method was called.

Note that when invokedynamic is perfix by a dataload, the bootstrap method has no access to the data, only the target of the callsite will see the method data.

To summarize, i propose to implement reified generics in the VM by introducing the notion of shadow class, a class only available at runtime that has associated class data and a user defined way to do fields and methods specialization at runtime. The main advantages of the solution is that old classes will not only be able to use anyfied generics but old code will be also optimized by JITs as if it was a new code. 


More information about the valhalla-spec-observers mailing list