Wildcards -- Models 4 and 5

Brian Goetz brian.goetz at oracle.com
Fri May 20 18:33:00 UTC 2016

In the 4/20 mail “Wildcards and raw types: story so far”, we outlined 
our explorations for fitting wildcard types into the first several 
prototypes. The summary was:


    Model 1: no wildcards at all


    Model 2: A pale implementation of wildcards, with lots of problems
    that stem from trying to fake wildcards via interfaces


    Model 3: basically the same as Model 2, except members are accessed
    via indy (which mitigated some of the problems but not all)

    The conclusion was: compiler-driven translation tricks are not going
    to cut it (as we suspected all along). We’ve since explored two
    other models (call them 4 and 5) which explore a range of options
    for VM support for wildcards. The below is a preliminary analysis of
    these options.

    Reflection, classes, and runtime types

While it may not be immediately obvious that this subject is deeply 
connected to reflection, consider a typical implementation of |equals()|:

|class Box<T> { T t; public boolean equals(Object o) { if (!(o instanceof 
Box)) return false; Box other = (Box) o; return (t == null && other.t == 
null) || t.equals(other.t); } } |

Some implementations use raw types (|Box|) for the |instanceof| and cast 
target; others use wildcards (|Box<?>|). While the latter is 
recommended, both are widely used in circulation. In any case, as 
observed in the last mail, were we to interpret |Box| or |Box<?>| as 
only including erased boxes, then this code would silently break.

The term “class” is horribly overloaded, used to describe the source 
class (|class Foo { ... }|), the binary classfile, the runtime type 
derived from the classfile, and the reflective mirror for that runtime 
type. In the past these existed in 1:1 correspondence, but no more — a 
single source class now gives rise to a number of runtime types. Having 
poor terminology causes confusion, so let’s refine these terms:

  * /class/ refers to a source-level class declaration
  * /classfile/ refers to the binary classfile
  * /template/ refers to the runtime representation of a classfile
  * /runtime type/ refers to a primitive, value, class, or interface
    type managed by the VM

So historically, all objects had a class, which equally described the 
source class, the classfile, and the runtime type. Going forward, the 
class and the runtime type of an object are distinct concepts. So an 
|ArrayList<int>| has a /class/ of |ArrayList|, but a /runtime type/ of 
|ArrayList<int>|. Our code name for runtime type is /crass/ (obviously a 
better name is needed, but we’ll paint that bikeshed later.)

This allows us to untangle a question that’s been bugging us: what 
should |Object.getClass()| return on an |ArrayList<int>|? If we return 
|ArrayList|, then we can’t distinguish between an erased and a 
specialized object (bad); if we return |ArrayList<int>|, then existing 
code that depends on |(x.getClass() == List.class)| may break (bad).

The answer is, of course, that there are two questions the user can ask 
an object: what is your /class/, and what is your /crass/, and they need 
to be detangled. The existing method |getClass()| will continue to 
return the class mirror; a new method (|getCrass()|) will return a 
runtime type mirror of some form for the runtime type. Similarly, a 
class literal will evaluate to a class, and some other form of literal / 
reflective lookup will be needed for crass.

The reflective features built into the language (|instanceof|, casting, 
class literals, |getClass()|) are mostly tilted towards classes, not 
types. (Some exceptions: you can use a wildcard type in an |instanceof|, 
and you can do unchecked static casts to generic types, which are 
erased.) We need to extend these to deal in both classes /and/ crasses. 
For |getClass()| and literals, there’s an obvious path: have two forms. 
For casting, we are mostly there (except for the treatment of raw types 
for any-generic classes — which we need to work out separately.) For 
instanceof, it seems a forced move that |instanceof Foo| is interpreted 
as “an instance of any runtime type projected from class Foo”, but we 
also would want to apply it to any reifiable type as well.

    Wildcard types

In Model 3, we express a parameterized type with a |ParamType| constant, 
which names a template class and a set of type parameters, which include 
both valid runtime types as well as the special type parameter token 
|erased|. One natural way to express a wildcard type is to introduce a 
new special type parameter token, |wild|, so we’d translate |Foo<any>| 
as |ParamType[Foo,wild]|.

In order for wildcard types to work seamlessly, the minimum 
functionality we’d need from the VM is to manage subtyping (which is 
used by the VM for |instanceof|, |checkcast|, verification, array store 
checks, and array covariance.) The wildcard must be seen to be a “top” 
type for all parameterizations:

|ParamType[Foo,T] <: ParamType[Foo,wild] // for all valid T |

And, wildcard parameterizations must be seen to be subtypes of of their 
wildcard-parameterized supertypes. If we have

|class Foo<any T> extends Bar<T> implements I<T> { ... } class Moo<any T> 
extends Goo { } |

then we expect

|ParamType[Foo,wild] <: ParamType[Bar,wild] ParamType[Foo,wild] <: 
ParamType[I,wild] ParamType[Moo,wild] <: Goo |

Wildcards must also support method invocation and field access to the 
members that are in the intersection of the members of all 
parameterizations (these are the total members (those not restricted to 
particular instantiations) whose member descriptors do not contain any 
type variables.) We can continue to implement member access via 
invokedynamic (as we do in Model 3, or alternately, the VM can support 
|invoke*| bytecodes on wildcard receivers.)

We can apply these wildcard behaviors to any of the wildcard models 
(i.e., retrofit them onto Model 2/3.)

        Partial wildcards

With multiple type variables, the rules for wildcards generalize 
cleanly, but the number of wildcard types that are a supertype of any 
given parameterized type grows exponentially in the number of type 
variables. We are considering adopting the simplification of erasing all 
partial wildcards in the source type system to a total wildcard in the 
runtime type system (the costs of this are: some additional boxing on 
access paths where boxing might not be necessary, and unchecked casts 
when casting a broader wildcard to a narrower one.)

    Model 4

A constraint we are under is: existing binaries translate the types 
|Foo| (raw type), |Foo<String>| (erased parameterization), and |Foo<?>| 
all as |LFoo;| (or its equivalent, |CONSTANT_Class[Foo]|); since 
existing code treats this as meaning an erased class, the natural path 
would be to continue to interpret |LFoo;| as an erased class.

Model 4 asks the question: “can we reinterpret legacy |LFoo;| in 
classfiles, and |Foo<?>| in source files, as |any Foo|“ (restoring the 
interpretation of |Foo<?>| to be more in line with user intuition.)

Not surprisingly, the cost of reinterpreting the binaries is extensive. 
Many bytecodes would have to be reinterpreted, including |new|, 
|{get,put}field|, |invoke*|, to make up the difference between the 
legacy meaning of these constructs and the desired new meaning. Worse, 
while boxing provides us a means to have a common representation of 
signatures involving |T| (T’s bound), in order to get to a common 
representation for signatures involving |T[]|, we’d need to either (a) 
make |int[]| a subtype of |Object[]| or (b) have a “boxing conversion” 
from |int[]| to |Object[]| (which would be a proxy box; the data would 
still live in the original |int[]|.) Both are intrusive into the 
|aaload| and |aastore| bytecodes and still are not anomaly-free.

So, overall, while this seems possible, the implementation cost is very 
high, all of which is for the sake of migration, which will remain as 
legacy constraints long after the old code has been migrated.

    Model 5

Model 5 asks the simpler question: can we continue to interpret |LFoo;| 
as erased in legacy classfiles, but upgrade to treating |Foo<?>| as is 
expected in source code? This entails changing the compilation 
translation of |Foo<?>| from “erased foo” to |ParamType[Foo,wild]|.

This is far less intrusive into the bytecode behavior — legacy code 
would continue to mean what it did at compile time. It does require some 
migration support for handling the fact that field and method 
descriptors have changed (but this is a problem we’re already working on 
for managing the migration of reference classes to value classes.) There 
are also some possible source incompatibilities in the face of separate 
compilation (to be quantified separately).

Model 5 allows users to keep their |Foo<?>| and have it mean what they 
think it should mean. So we don’t need to introduce a confusing 
|Foo<any>| wildcard, but we will need a way of saying “erased Foo”, 
which might be |Foo<? extends Object>| or might be something more 
compact like |Foo<erased>|.


Comparing the three models for wildcards (2, 4, 5):

  * Model 2 defines the source construct |Foo<?>| to permanently mean
    |Foo<erased ref>|, even when |Foo| is anyfied, and introduces a new
    wildcard |Foo<any>| — but maintains source and binary compatibility.
  * Model 4 let’s us keep |Foo<?>|, and retroactively redefines bytecode
    behavior — so an old binary can still interoperate with a reified
    generic instance, and will think a |Foo<int>| is really a
  * Model 5 redefines the /source/ meaning of |Foo<?>| to be what users
    expect, but because we don’t reinterpret old binaries, allows some
    source incompatibility during migration.

I think this pretty much explores the solution space. Our choices are: 
break the user model of what |Foo<?>| means, take a probably prohibitive 
hit to distort the VM to apply new semantics to old bytecode, or accept 
some limited source incompatibility under separate compilation but 
rescue the source form that users want.

In my opinion, the Model 5 direction offers the best balance of costs 
and benefits — while there is some short-term migration pain (in 
relatively limited cases, and can be mitigated with compiler help), in 
the long run, it gets us to the world we want without permanently 
burdening either the language (creating confusion between |Foo<?>| and 
|Foo<any>|) or the VM implementation.

In all these cases, we still haven’t defined the semantics of /raw 
types/. Raw types existed for migration between pre-generic and generic 
code; we still have that migration problem, plus the new migration 
problems of generic to any-generic, and of pre-generic to any-generic. 
So in any case, we’re going to need to define suitable semantics for raw 
types corresponding to any-generic classes.


More information about the valhalla-spec-observers mailing list