Collapsing the requirements
brian.goetz at oracle.com
Sat Aug 3 16:37:56 UTC 2019
As Remi noted, we had some good discussions at JVMLS this week. Combining that with some discussions John and I have been having over the past few weeks, I think the stars are aligning to enable us to dramatically slim down the requirements. The following threads have been in play for a while:
- John: I hate the LPoint/QPoint distinction
- Brian: I hate null-default types
- Remi: I hate the V? type
But the argument for each of these depended, in some way, on the others. I believe, with a few compromises, we can now prune them as a group, which would bring us to a much lower energy state.
## L^Q World — Goodbye `LV;`
We’ve taken it as a requirement that for a value type V, we have to support both LV and QV, where LV is the null-adjunction of QV. This has led to a lot of complexity in the runtime, where we have to manage dual mirrors.
The main reason why we wanted LV was to support in-place migration. (In Q-world, LV was the box for QV, so it was natural for migration.) But, as we’ve worked our migration story, we’ve discovered we may not need LV for migration. And if we don’t, we surely don’t need it for anything else; worst-case, we can erase LV to `LValObject` or even `LObject` (or, if we’re worried about erasure and overloading, to something like `LObject//V` using John’s type-operator notation.)
Assuming we can restructure the migration story to not require LV to represent a VM-generated “box" — which I believe we can, see below — we can drop the requirement for LV. An inline class V gives rise to a single type descriptor, QV (or whatever we decide to call it; John may have plans here.)
## Goodbye `V?`
The other reason we wanted LV was that it was the obvious representation for the language type `V?` (V adjoined with null.) Uses for `V?` include:
- Denoting non-flattened value fields;
- Denoting non-flattened value arrays;
- Denoting erased generics over values (`Foo<V?>`);
- Denoting the type that is the adjunction of null to V (V | Null), when we really want to talk about nullability.
But, we can do all this without a `V?` type; for every V, there is already at least one super type of V that includes `V|Null` — Object, and any interface implemented by V. If we arrange that every value type V has a super type V’, not implemented by any other type — then the value set of this V’ is exactly that of `V?`. And we can use V’ to do all the things `V?` did with respect to V — including sub typing. The language doesn’t need the `?` type operator, it just needs to ensure that V’ always exists. Which turns out to be easy, and also turns out to be essential to the migration story.
We can formalize this by requiring that every value type have a companion interface (or abstract class) supertype. Define an envelope-class pair (“eclair”) as a pair (V, I) such that:
- V is an inline class
- I is a sealed type
- I permits V (and only V)
- V <: I
(We can define eclairs for indirect classes, but they are less interesting — because indirect classes already contain null.)
If every value type be a member of an eclair, we can use V when we want the flattenable, non-nullable, specializable type; and we use I when we want the non-flattenable, nullable, erased “box”. We don’t need to denote `V?`; we can just use I, which is an ordinary, nominal type.
Note that the VM can optimize eclairs about as well as it could for LV; it knows that I is the adjunction of null to V, so that all non-null values of I are identity free and must be of type V.
What we lose relative to V? is access to fields; it was possible to do `getfield` on a LV, but not on I. If this is important (and maybe it’s not), we can handle this in other ways.
#### With sugar on top, please
We can provide syntax sugar (please, let’s not bike shed it now) so that an inline clause _automatically_ acquires a corresponding interface (if one is not explicitly provided), onto which the public members (and type variables, and other super types) of C are lifted. For sake of exposition, let’s say this is called `C.Box` — and is a legitimate inner class of C (which can be generated by the compiler as an ordinary classfile.) We’ve been here before, and abandoned it because “Box” seemed misleading, but let’s call it that for now. And now it is a real nominal type, not a fake type. In the simplest case, merely declaring an inline class could give rise to V.Box.
Now, the type formerly known as `V?` is an ordinary, nominal interface (or abstract class) type. The user can say what they mean, and no magic is needed by either the language or the VM. Goodbye `V?`.
#### Boxing conversion
Given the constraints of the eclair relationship, it would be reasonable for the compiler to derive from this that there is a boxing conversion between C and I (I is just the value set of C, plus null — which is the relationship boxes have with their corresponding primitives.) The boxing operation is a no-op (since C <: I) and the unboxing operation is a null checking cast.
#### Erased generics
Using the eclair wrapper also kicks the problem of erased generics down the road; if we use `Foo<I>` for erased generics, and temporarily ban `Foo<V>`, when we get to specialized generics, it will be obvious what `Foo<V>` means (their common super type will be `Foo<? extends I>`). This is a less confusing world, as then “List of erased V” and “List of specialized V” don’t coexist; there’s only “List of V” and “List of V’s Box”.
The ability to migrate Optional and friends to values has been an important goal, but it has been the source of significant complexity. Our previous story leaned hard on “When we migrate X to a value, LX will describe the box, so old callsites will continue to link.” But it turned out that brought a lot of baggage (forwarding bridges, null-default values) and compromises (null-default values lose their calling-convention optimizations), and over the past few weeks John and I have been cooking up a simpler eclair-based recipe for this.
The world is indeed full of existing utterances of `LOptional`, and they will still want to work. Fortunately, Optional follows the rules for being a value-based class. We start with migrating Optional from a reference class to an eclair with a public abstract class and a private value implementation. Now, existing code just works (source and binary) — and optionals are values. But, this isn’t good enough; existing variables of type Optional are not flattened.
One of the objections raised to in-place migration was nullity; in order to migrate Optional to a true value, it would have to be a null-default value, and this already entailed compromises. If we’re willing to compromise further, we can get what we want without the baggage. And that compromises is: give up the name.
So we define a new public value class `Opt<T>` which is the value half of the eclair, and the existing Optional is the interface/abstract class half. Now, existing fields / arrays can migrate gradually to Opt, as they want the benefit of flattening; existing APIs can continue to truck in Optional (which have about the same optimizations as a null-default value would have on the stack.)
This works because of the boxing conversion. Suppose we have old code that does:
Optional o = makeAnOptional()
when the user changes this to
Opt o = …
the compiler seems the RHS is an Optional and the LHS is a Opt, and there is a boxing conversion between them, so we insert an unbox conversion (null check) and we’re done. Users can migrate their fields gradually. The cost: the good name gets burned. But there is a compatible migration path from ref to value.
Later, when we have bridges (we don’t need them yet!), we can migrate the library uses from Optional to Opt.
## Null-default values
About 75% of the motivation for null-default values — another huge source of complexity — was to support the migration of value-based classes. And it wasn’t even a great solution — because we still lost some key optimizations (e.g., calling conventions.) With the Optional -> Opt path, we don’t need null-default values, we get ordinary values. So while we pay the cost of changing the name, we gain the benefit that the new values, once the full migration is effected, we don’t carry the legacy performance baggage.
Another 20% of the motivation was for security-sensitive classes whose default value did not represent a useful value, for which we wanted not null-default-ness but really initialization safety. Let’s look at another way to get there.
There are a few ways to get there. One is to treat this problem as protecting such classes from uninitialized fields or array elements; another is to ensure that such classes (a) have no public fields and (b) perform the correct check at the top of each method (which can be injected by the compiler.) I don’t want to solve that problem right here, but I think there enough ways to get there that we can assume this isn’t a hard requirement.
The other 5% was just the user-based “I want null in my value set.” For those, we can tell users: use the interface box when you need null.
In one swoop, we can banish LV from the VM, V? from the language, and null-default values, by making a simple requirement: every value type is paired with an interface or abstract class “box”. For most values, this can be automatically generated by the compiler and denoted via a well-known name (e.g., V.Box); for some values, such as those that are migrated from reference types, we can explicitly declare the box type and pick explicit names for both types.
There’s a lot to work out, but I think it should be clear enough that this is a much, much lower energy state than what we were aiming at for L10, and also a simpler user model.
Let’s focus discussions on validating the model first before we dive into mechanism or surface syntax.
More information about the valhalla-spec-experts