LWorld1 initial phase proposal

John Rose john.r.rose at oracle.com
Thu Dec 14 01:29:31 UTC 2017

Nice!!  I'm going to tear through this with a bunch of comments.

(But anything having to do with optimizations specific to HotSpot
will go to valhalla-dev.)

On Dec 13, 2017, at 11:58 AM, Karen Kinnear <karen.kinnear at oracle.com> wrote:
> Folks,
> Based on Dan Smith’s write-up: http://cr.openjdk.java.net/~dlsmith/values-notes.html <http://cr.openjdk.java.net/~dlsmith/values-notes.html> of a possible
> LWorld direction, we’ve written a proposal for a potential initial LWorld1 prototype and would appreciate review.
> We can walk through at our next meeting Dec 20th.
>                            JVM Lworld Value Types Phase I
> http://cr.openjdk.java.net/~dlsmith/values-notes.html <http://cr.openjdk.java.net/~dlsmith/values-notes.html>
> I. New terminology: for LWorld
>  L-Type as reference or value type
>  Q-Type as value type
>  R-Type as reference only - open question - do we need this?

For the JVM's descriptor language, we need the legacy syntax for L-types and
maybe a new syntax for Q-types:  TBD.  But for syntax it's conceivable that we
only need this:

L-Descriptor: the descriptor of any reference type or value type.

For talking about the values and object references flowing around inside the
JVM and through its data structures, we need terminology to distinguish
values from objects.  For that, we are currently leaning heavily on the word
"value" and the opposed word "object":

value instance =  entity in the JVM w/o object identity, and not a reference
value class = class whose instances are value instances (disregarding possibility of boxes)
object instance = entity in the JVM w/ object identity, manipulated by reference
object class = class whose instances are object instances


null = the unique reference which does not point to any object instance
reference = the identity of an object instance, or else null
value = ambiguous for (a) a value instance, or (b) any entity the JVM can store in some variable
(So use the unqualified term "value" with caution; context must determine.)


value type = the type of a variable which can hold some value instances but not any references
reference type = the type of a variable which can hold some references but not any value instances
open/general/union type = the type of a variable which can hold some mix of value instances and references (incl. null)

I don't think we have settled on a term for open/general/union types.

The JVM has nothing called an "object type" because any JVM
variable that can refer to an object is in fact a nullable reference.
An "object type", if it were a useful thing, would be the type of a
non-null reference variable.  We can agree to use it as a synonym
to "reference type" or as a non-nullable reference, when being
careful about nulls.  Context will determine, as with the term "value".

Given those distinctions, we can then use the following shorthand terms
in our conversations, shorthands which are *inspired* by the JVM descriptor
syntax, but are not necessarily identical to it:

R-type: a reference type
Q-type: a value type
U-type: an open/general/union type
L-type: same thing as a U-type, as long as we are in the context of L-world

We can say "U-type" or "open/general/union type" when we want to be
clear that some entity in the JVM can be either a value instance or reference.
We can say "R-type" when we want to be clear that value instances are
excluded.  In the simplest version of L-world, R-types don't appear to play
a large role; if a variable can carry an object reference, it can also carry
a value.

Anyway, that gives us the following additional options for descriptors:

Q-descriptor:  the descriptor of a Q-type, when context could interpret an L-descriptor differently (TBD)
R-descriptor:  the descriptor of an R-type, when context could interpret an L-descriptor differently (TBD)
U-descriptor:  the descriptor of an L-type, when context could interpret an L-descriptor differently (TBD)

We only need these options if there are contexts that require them.
It is probably better to try to avoid such contexts in the first place.
The way to do that is to do everything with L-descriptors, and supply
contextual bits when those L-descriptors should be interpreted as
Q-types or R-types.

Example:  A field of value type either needs an explicit flatten flag,
or a Q-descriptor.  The flatten flag (ACC_VALUE) makes an unambiguous
context for the L-descriptor of the field type, so that it means a Q-type
rather than an L-type.

> II. Assumptions:
> 1. New root: LObject - for all references and value types

java.lang.Object is a U-type.
Object classes (other than java.lang.Object) define R-types.
Value classes define Q-types.
Interfaces define U-types.
Abstract classes define R-types.

The following are possibilities but I hope most of them can be false:

Perhaps some object classes will *also* imply frozen versions which are Q-types; TBD.
Perhaps some value classes will *also* imply heavy-boxed versions which are R-types; TBD.
Perhaps some interfaces will be *restricted* to Q-types or R-types; TBD.
Perhaps some abstract classes can help define Q-types or U-types; TBD.

> 2. Value Type characteristics:
>   immutable (not all the way down)


In terms of its component fields, a value instance is shallowly immutable,
not deeply immutable.  The JDK term is "unmodifiable".  In terms of
"codes like a class", the value class is a value-based class.

In terms of its containing variable, a value instance may be replaced by
another value instance, unless (of course) the containing variable is final.
In terms of "works like an int", you have to overwrite the whole value in
its variable.

You cannot individually address and update fields within the value, just as
you cannot address and update individual bits within an int.  This is the
JVM level view; at the language level you might have operators like
"&=" or "++" which might *seem* to address and update part or all of the
value itself, independently of its containing variable.  At the JVM level,
there must always be a write-back of the new value to the old varabiel.

>   not nullable

(Thus, null is never a value instance, only a reference.)

>   no identity

(Comment:  Value types enforce a superset of the constraints on value-based
classes.  In that way, VBC's anticipate VTs.)

>   flattenable when contained in reference, value type or array

"contained in reference" is ambiguous.  I suggest this language:

flattenable when contained in a variable (instance field, static field, array element, or local)

>   no default box (buffer, not box)
>      - if you want identity, create a reference storing a value type field

I think "create an object instance" is less ambiguous than "create a reference".

There are use cases for boxes, and the troubling legacy types like java.lang.Integer,
but I agree that we should push back on them and try to get away without them.
See above where I put a TBD on value classes defining secondary R-types.

>   support interfaces

Yes.  Interfaces define U-types.  (TBD whether there are any variations on this theme.)

And we can say java.lang.Object is an "honorary interface".  It is a U-type
which all classes implement.  Like interfaces, it is a valid bound for all types
and all generic variables.

> III. goals:
>    existing interfaces should be implementable by both references and value types
>     - without requiring recompilation
>    existing code should be able to handle both references and value type arguments
>     - without requiring recompilation

(This is a big task.  If we pull it off, we enjoy the simplicity of L-world
relative to U-world.  If not, we have to deal with the complexity of
defining Q/U/L-types and the bridges between them.)

>    Migration:
>      LType -> QType migration
>        author must opt-in: by declaring in source (language policy)
>           - requires recompilation

Specifically, there is object class to value class migration.
Or, emphasizing the object-ness of the original class,
R-type to Q-type class migration.

For descriptors (as opposed to classes), there might be
no migration, if there are no new descriptor spellings.
But if we do any (non-empty) subset of Q/R/U-descriptors,
there are associated migration stories.

(Note that R-type to U-type migration is probably cheap or
a no-op in L-world, since R-types in the old world and U-types
in the new world are both spelled with L-descriptors.
Indeed, that's the point of L-world.)

>      Any existing class that meets the requirements could become a value type
>      ValueBasedClasses are candidates

Yes.  (Spelling: "value-based classes".)  See:

>      LWorld Value Type Static restrictions (for javac)
>         j.l.Object as the only superclass

(Yes; bookmark a later question of value classes which are also abstract.)

Value instances must be protected from identity operations, and generic
code which uses identity operations must be protected from ill effects of
value instances.  This involves a web of restrictions and compromises
in both the JLS, JVMS, and JDK.

>         no use of identity:
>            no use of Object methods invalid for value types: wait/notify*/finalize

This is a JDK restriction, in the relevant native methods.  The restriction
on finalize is a JVM restriction.

>         no assumptions of nullability

This turns into a restriction where U-types are narrowed to Q-types.
Any such narrowing operation must throw NPE for nulls.

>         no non-private constructors

This is a JLS restriction.  The JVM should simply outlaw <init> methods
altogether.  "I don't think that word means what you think it means."

>         no use of reference-only bytecodes: new, putfield, monitorenter/exit, aconst_null

These don't fit together easily.  new and aconst_null produce R-types.
That doesn't seem to need a restriction.

putfield and monitore/e consume R-types.  That needs a restriction against
Q-types.  It must be dynamically applied, at least in the case of L-Object
and L-interface.  Perhaps we need a new exception, NotAnObjectException.
(Cue bikeshed team.  Don't wanna call it Value*Exception, please.)

Actually, putfield is linked against a class which (at link time, no earlier)
is detected to be either an object class or a value class.  If the latter, a
LinkageError needs to get thrown, not NotAnObjectException.
That might be NotAnObjectError <: LE.

(If we do Q-types derived from object classes, then putfield *will* need
a runtime check *also*, to detect frozen Q-versions of R-objects.
Bookmark that for later.)

>       LWorld Static restrictions (for verifier) TBD
>         no use of reference-only bytecodes: new, putfield, monitorenter/exit, aconst_null

No, these are link-time, not verify-time restrictions.  The verifier must not be
required to load classes just to determine the value-vs-object distinction.
If we want/need to verify value-vs-object, we need new instruction bits
or new descriptors (Q-descriptors) to express that to the verifier w/o class
loading.  Conversely, we can avoid Q-descriptors if we don't require the
verifier to make those distinctions.  I would prefer the latter.  Is it possible?

>       LWorld new Dynamic restrictions (based on dynamic type)
>         for Value Types: no use of reference-only bytecodes: new, putfield, monitorenter/exit, aconst_null
>         for Reference Types: no use of value-type-only bytecodes: vdefault, withfield

If a class can be only object or instance (and never both), then a bytecode that
links to that class can know, at link time, whether it is operating on objects or
instances.  That is (probably) the case with vdefault and withfield.  (It might
get more complex in the future if we fool around with deriving Q-types from
R-classes, aka object classes.)

>         no use of if_acmpeq/ne without following .equals call

That's a JLS restriction, I think.  At the JVMS level we simply say that acmp
treats values like op==(float,float) treats NaNs.  Or maybe say it rolls a die if
it can predict that the two value operands are "the same" in some internal sense
of substitutability, else returns false.  I prefer the first, even though the second
is a little easier to implement.

> IV: Non-goals:
>     Primitives as value types - future phase

(Yes, another bookmark.)

> V. Expected Behaviors:
> 1. Object methods
>   wait/notify/etc: if QType: throw ICCE (IncompatibleClassChangeError)
>   getClass: normal behavior (no ambiguity with no default box)
>   toString: nothing special
>   clone: nothing special (must implement cloneable)
You are mostly about that one.  The value might choose to clone object
substructure, using a copy method which might be named "clone".

But implementing Cloneable is not the same as implementing clone.
They are different requests.  And "Cloneable" gates "do a shallow
copy of my fields in the default Object.clone method".  That is the
object version of simple value instance assignment.  In other words,
all values are Cloneable for free.  So it doesn't make any sense to
require a VT to implement Cloneable, unless that is just a gating
factor for allowing Object.clone to return the same value.  Arguably
value types should not be allowed to implement Cloneable, unless
there is some use case for that (awful thing) as a marker of some
sort.  Maybe it's a laissez-faire thing here; allow it.

>   finalize: ICCE

As with clone, maybe we can be laissez-faire.  "Sure, define that
method, but don't expect me to call it ever, because you don't have
an identity which can become unreachable."

>   equals: if QType - component-wise equals (ucmp API? ucmp bytecode?) (overridable)

IMO the component-wise default should be a JDK feature, but not a
JVM feature.  Maybe we want System.isSubstitutable(x,y) and
System.substitutabilityHashCode(x) as JDK functions.  I don't think
we really need a ucmp bytecode, just an acmp bytecode which treats
values like NaNs.  So, bookmark ucmp for later.

>   hashcode: TBD - needs to be based on equals


(More later.)

— John

More information about the valhalla-spec-observers mailing list