Moving from VVT to the L-world value types (LWVT)

Remi Forax forax at univ-mlv.fr
Fri Jan 19 21:10:39 UTC 2018


I think there is an alternative encoding for Q types which is worth to consider,
see below ...

----- Mail original -----
> De: "Frederic Parain" <frederic.parain at oracle.com>
> À: "valhalla-dev" <valhalla-dev at openjdk.java.net>
> Envoyé: Mardi 16 Janvier 2018 21:56:11
> Objet: Moving from VVT to the L-world value types (LWVT)

> Here’s an attempt to bootstrap the L-world exploration, where java.lang.Object
> is the top type of all value classes (as discussed during the November meetings
> in Burlington).
> 
> This proposal tries to evolve the JVMS with a small set of changes to have an
> implementable specification of the L-world. Instead of trying to add Q/R/U-types
> to
> the JVMS, the approach is to expend the JVMS notion of “reference” to cover
> both regular classes and value classes. The notion of “class” has also be
> extended
> to cover both, but when needed, it is possible to specify an “object class” or a
> “value class”, or respectively, “an instance of an object class” vs “an instance
> of
> a value class”. The “Q…;” format is still used for value class types, but the
> “;Q”
> trick is gone.

The VM needs to know if a class is a value type or no at the time we compile something.
Q-type is a way to encode that be that's not the only way.


> 
> The attach document contains sections of the JVMS that have been modified
> to implement the L-world. The text doesn’t have change bars, so people are
> encouraged to read each modified section entirely to see if it is consistent to
> cover all cases of the L-world.
> 
> 
> Here’s a quick summary of the changes with some consequences on the HotSpot
> code:
>  - all v-bytecodes are removed except vdefault and vwithfield
>  - all bytecodes operating on an object receiver are updated to support values as
>  well,
>    except putfield and new
>  - single carrier type for both instances of object classes and instances of
>  value classes
>  - this carrier type maps to the T_OBJECT BasicType
>  - T_VALUETYPE still exists but its usage is limited (same purpose as T_ARRAY)
>  - qtos TosState is removed
>  - JNI: the jobject type can be used to carry either a reference to an object or
>  an
>           array or a value. The type jvaluetype, sub-type of jobject, is used when only
>           a value class instance is expected
> - Q…; remains the way to encode value classes in signature (fields and methods)
> - In the constant pool, the CONSTANT_CLASS_info entry type is used to store a
>   symbolic reference to either an object class or a value class
> - the ;Q escape sequence is not used anymore in value class names

agree, let's remove ;Q encoding, it was a hack.

> 
> 
> One important point of this exercise is to ensure that the migration of Value
> Based Classes
> into Value Classes is possible, and doable with a reasonable complexity and
> costs. In addition
> to the JVMS update (and consistent with the JVMS modifications), here’s a set of
> proposals
> on how to deal with the VBC migration.
> 
> 
> Migration of Value Based Classes into Value Classes:
>  - challenges:
>      - signature mismatch
>      - null
>      - change in behavior

- several classes are compiled at different times so some are using a value as a value type while some others are using a value as a reference type
  so we have a problem close to the separate compilation problem.

> 
>  - proposal for signature mismatch:
>       - with LWVT, value class types in signatures are using the Q…; format
>       - legacy code is using signature with L…; format (because VBC are object
>       classes)
>       - methods will have two signatures:
>         - true signature, which could include Q…; elements
>         - a L-ified signature where all Q…; elements are re-written with the L…; format
>         - method lookup still works by signature string comparisons
>         - the signature of the method being looked up will compared against both the
>           true and the L-ified signatures, if the looked up signature matches the L-ified
>           signature but not the true signature, it means a situation where legacy code
>           is trying to invoke migrated code has been detected, and additional work might
>           be required for the invocation (actions to be taken have to be defined)
>        - signature mismatch can also occur for fields, this is still being
>        investigating, the
>          proposal will be updated as soon as we have a solution ready to be published

legacy code => legacy classes, each of them at a different states,
so you will end up with more than two signatures.

suppose you have two classes V and W that will becomes value types, but they are from different module managed by different companies,
you can have 3 classes like this
  class A { LW; m(LV;) }
  class B extends A { LW; m(LV;) }
  class C extends B { LW; m(LV;) }

then you migrate B to declare V as a value type
  class B extends A { LW; m(QV;) }
then you migrate A to declare W as a value type
  class A { QW; m(LV;) }
here, there are more than two true signatures.  

Let's take a step back, in Java (the language), we have already introduced features like generics or varargs that behave the same way,
the only difference here is that this is something that the VM has to deal with and not something the java compiler has to deal with.

How varargs works in the java compiler, it's easy, the method descriptor is an array, and the access modifier say if its a varargs or not,
i think we should encode Q-type the same way.
As i said, for a class, we want to know at the time this class was compiled, which types used by this class is a value type.

So i propose to introduce a new class attribute named ValueTypes that contains the set of all value types that are used by that class,
so for the VM a type is by default a reference type (a L-type) apart if the type is listed in the attribute ValueTypes, in that case it's a value type.
Basically, having a bit for each type at class level that say if a type behave as a value type or not when the class was compiled.

In term of implementation,
  - method only have one descriptor using L types
  - the verifier reject code that store null, use synchonize, etc with a type which is listed in the class attribute ValueTypes.
  - at runtime, the VM check that when loading a class listed in the attribute ValueTypes is a value type.
  - when creating the object layout of a class, field of a type listed in ValueType may be flattened
  - when creating an array of a type listed in ValueType may be flattened.
  - when calling a method or accessing a field, after resolution, the interpreter may do adaptation in order to buffer (value type -> ref) or nullcheck (ref -> value type).   

> 
>  - proposal for null references leaking to migrated code
>      - having a null reference for a Value Based Class variable or field is valid in
>      legacy code
>        but it becomes invalid when the Value Based Class has been migrated to a Value
>        Class
>      - trying to prevent all references with a value class type to get a null value
>      would be very
>        expensive (it would require to look at the stackmap for each assignment to a
>        local variable)
>     -  the proposed solution is to allow null references for local variable and
>     expression stack slots,
>        but forbid them for fields or array elements (bytecodes operating on fields and
>        array have to
>        be updated to throw a NPE whenever a null reference is provided instead of a
>        value class
>        instance)
>     - null references are likely to be an issue for JIT optimizations like passing
>     values in registers
>       when a method is invoked. The proposed solution is to only allow null references
>       for value classes
>       in legacy code, by detecting them and blocking them when leaking to migrated
>       code. The
>       detection can be done at invocation time, when a mismatch between the signature
>       expected
>      by the caller and the real signature of the callee is detected (see signature
>      mismatch proposal above)
>    - the null reference should also be detected and blocked when it is used as a
>    return value and the
>      type of the value to be returned is a value class type

I believe what i'm proposing above match all these points.

> 
> 
> In addition to the JVMS update, here’s a chart trying to summarize the new
> checks that will have to
> be added to existing bytecode when moving the vbytecodes semantic in to a*
> bytecodes. The categories
> in the chart are not very precise, but we can use it as a starting point for our
> discussions. The chart
> can also help defining which experiments could be done to estimate the costs of
> the different additional
> checks needed to be added to existing bytecodes.
> 
> All these are preliminary works for a proposal to implement the L-world value
> types and not a definitive
> specification. This has to be analyzed and discussed before any attempt to
> implement it starts.
> Feel free to send feedback, comments, other proposals, etc.
> 
> Thank you,
> 
> Fred

Rémi


More information about the valhalla-dev mailing list