Wildcards -- Models 4 and 5

Remi Forax forax at univ-mlv.fr
Thu Jun 2 10:21:30 UTC 2016

There is another model (model 6),
in order to support species, we need at runtime to have a way to represent them,
so a static species can be stored in a location which is not along with the instance fields nor along with the static fields.

Actually for the VM, an instance is represented like this:

 header  ----> class
 ------        vtable1
 field1        vtable2
 field2        ...
 ...           static fields

It can be a little different if the .class object and the class of an instance are two different objects
(for the JIT it's better to have the class to be a constant pointer but java.lang.Class is a Java object that may need to be moved in memory).

Now, we want something like this
 header  ----> species   ----------> class
 ------        vtable1
 field1        vtable2
 field2        ...
 ...           species fields

so at runtime the header of an object is not a class anymore but a species (the runtime representation of a species).

This allows the VM to answer to things like:
  obj instanceof ArrayList<int>
and everybody cheers ...

No, because
  with <any E>, obj instanceof ArrayList<E> may work or not.

In fact there is little reason to allow a user to see species at runtime*,
- it makes ArrayList<E> reified sometimes, so with a Map<String, List<E>> map,
  sometimes map.get("foo") will throw an exception, sometimes it will not
  (because the erasure at compile time) the VM insert a cast to List<E> in front of the call to map.get(),
  depending if E is a String or an int, the behavior will be different. 
- The erasure concept (the runtime part) is entrenched in the mind of million developers,
  changing that is a recipe for disaster.

so IMO even if the VM reify species at runtime, a developer should not be able to see that,
it's better to lure him to think that the erasure at runtime is done the same way in Java 10 that it is in Java 5.

This model has a cost at runtime, a checkcast/instanceof/arraystore to ArrayList<E> may be polymorphic while it was monomorphic before**,
or doing a dynamic typecheck requires a double indirection if the class is anyfied.

An for a wildcard, ArrayList<?> is mapped to ArrayList (from the runtime class point of view) as usual, so no big deal.

It's IMO a far better model just because from the user point of view, nothing changed.


* you can access to a species field without seeing the species by itself.
** let suppose that ArrayList is effectively final here.

----- Mail original -----
> De: "Brian Goetz" <brian.goetz at oracle.com>
> À: valhalla-spec-experts at openjdk.java.net
> Envoyé: Vendredi 20 Mai 2016 20:33:00
> Objet: Wildcards -- Models 4 and 5
> In the 4/20 mail “Wildcards and raw types: story so far”, we outlined
> our explorations for fitting wildcard types into the first several
> prototypes. The summary was:
>   *
>     Model 1: no wildcards at all
>   *
>     Model 2: A pale implementation of wildcards, with lots of problems
>     that stem from trying to fake wildcards via interfaces
>   *
>     Model 3: basically the same as Model 2, except members are accessed
>     via indy (which mitigated some of the problems but not all)
>     The conclusion was: compiler-driven translation tricks are not going
>     to cut it (as we suspected all along). We’ve since explored two
>     other models (call them 4 and 5) which explore a range of options
>     for VM support for wildcards. The below is a preliminary analysis of
>     these options.
>     Reflection, classes, and runtime types
> While it may not be immediately obvious that this subject is deeply
> connected to reflection, consider a typical implementation of |equals()|:
> |class Box<T> { T t; public boolean equals(Object o) { if (!(o instanceof
> Box)) return false; Box other = (Box) o; return (t == null && other.t ==
> null) || t.equals(other.t); } } |
> Some implementations use raw types (|Box|) for the |instanceof| and cast
> target; others use wildcards (|Box<?>|). While the latter is
> recommended, both are widely used in circulation. In any case, as
> observed in the last mail, were we to interpret |Box| or |Box<?>| as
> only including erased boxes, then this code would silently break.
> The term “class” is horribly overloaded, used to describe the source
> class (|class Foo { ... }|), the binary classfile, the runtime type
> derived from the classfile, and the reflective mirror for that runtime
> type. In the past these existed in 1:1 correspondence, but no more — a
> single source class now gives rise to a number of runtime types. Having
> poor terminology causes confusion, so let’s refine these terms:
>   * /class/ refers to a source-level class declaration
>   * /classfile/ refers to the binary classfile
>   * /template/ refers to the runtime representation of a classfile
>   * /runtime type/ refers to a primitive, value, class, or interface
>     type managed by the VM
> So historically, all objects had a class, which equally described the
> source class, the classfile, and the runtime type. Going forward, the
> class and the runtime type of an object are distinct concepts. So an
> |ArrayList<int>| has a /class/ of |ArrayList|, but a /runtime type/ of
> |ArrayList<int>|. Our code name for runtime type is /crass/ (obviously a
> better name is needed, but we’ll paint that bikeshed later.)
> This allows us to untangle a question that’s been bugging us: what
> should |Object.getClass()| return on an |ArrayList<int>|? If we return
> |ArrayList|, then we can’t distinguish between an erased and a
> specialized object (bad); if we return |ArrayList<int>|, then existing
> code that depends on |(x.getClass() == List.class)| may break (bad).
> The answer is, of course, that there are two questions the user can ask
> an object: what is your /class/, and what is your /crass/, and they need
> to be detangled. The existing method |getClass()| will continue to
> return the class mirror; a new method (|getCrass()|) will return a
> runtime type mirror of some form for the runtime type. Similarly, a
> class literal will evaluate to a class, and some other form of literal /
> reflective lookup will be needed for crass.
> The reflective features built into the language (|instanceof|, casting,
> class literals, |getClass()|) are mostly tilted towards classes, not
> types. (Some exceptions: you can use a wildcard type in an |instanceof|,
> and you can do unchecked static casts to generic types, which are
> erased.) We need to extend these to deal in both classes /and/ crasses.
> For |getClass()| and literals, there’s an obvious path: have two forms.
> For casting, we are mostly there (except for the treatment of raw types
> for any-generic classes — which we need to work out separately.) For
> instanceof, it seems a forced move that |instanceof Foo| is interpreted
> as “an instance of any runtime type projected from class Foo”, but we
> also would want to apply it to any reifiable type as well.
>     Wildcard types
> In Model 3, we express a parameterized type with a |ParamType| constant,
> which names a template class and a set of type parameters, which include
> both valid runtime types as well as the special type parameter token
> |erased|. One natural way to express a wildcard type is to introduce a
> new special type parameter token, |wild|, so we’d translate |Foo<any>|
> as |ParamType[Foo,wild]|.
> In order for wildcard types to work seamlessly, the minimum
> functionality we’d need from the VM is to manage subtyping (which is
> used by the VM for |instanceof|, |checkcast|, verification, array store
> checks, and array covariance.) The wildcard must be seen to be a “top”
> type for all parameterizations:
> |ParamType[Foo,T] <: ParamType[Foo,wild] // for all valid T |
> And, wildcard parameterizations must be seen to be subtypes of of their
> wildcard-parameterized supertypes. If we have
> |class Foo<any T> extends Bar<T> implements I<T> { ... } class Moo<any T>
> extends Goo { } |
> then we expect
> |ParamType[Foo,wild] <: ParamType[Bar,wild] ParamType[Foo,wild] <:
> ParamType[I,wild] ParamType[Moo,wild] <: Goo |
> Wildcards must also support method invocation and field access to the
> members that are in the intersection of the members of all
> parameterizations (these are the total members (those not restricted to
> particular instantiations) whose member descriptors do not contain any
> type variables.) We can continue to implement member access via
> invokedynamic (as we do in Model 3, or alternately, the VM can support
> |invoke*| bytecodes on wildcard receivers.)
> We can apply these wildcard behaviors to any of the wildcard models
> (i.e., retrofit them onto Model 2/3.)
>         Partial wildcards
> With multiple type variables, the rules for wildcards generalize
> cleanly, but the number of wildcard types that are a supertype of any
> given parameterized type grows exponentially in the number of type
> variables. We are considering adopting the simplification of erasing all
> partial wildcards in the source type system to a total wildcard in the
> runtime type system (the costs of this are: some additional boxing on
> access paths where boxing might not be necessary, and unchecked casts
> when casting a broader wildcard to a narrower one.)
>     Model 4
> A constraint we are under is: existing binaries translate the types
> |Foo| (raw type), |Foo<String>| (erased parameterization), and |Foo<?>|
> all as |LFoo;| (or its equivalent, |CONSTANT_Class[Foo]|); since
> existing code treats this as meaning an erased class, the natural path
> would be to continue to interpret |LFoo;| as an erased class.
> Model 4 asks the question: “can we reinterpret legacy |LFoo;| in
> classfiles, and |Foo<?>| in source files, as |any Foo|“ (restoring the
> interpretation of |Foo<?>| to be more in line with user intuition.)
> Not surprisingly, the cost of reinterpreting the binaries is extensive.
> Many bytecodes would have to be reinterpreted, including |new|,
> |{get,put}field|, |invoke*|, to make up the difference between the
> legacy meaning of these constructs and the desired new meaning. Worse,
> while boxing provides us a means to have a common representation of
> signatures involving |T| (T’s bound), in order to get to a common
> representation for signatures involving |T[]|, we’d need to either (a)
> make |int[]| a subtype of |Object[]| or (b) have a “boxing conversion”
> from |int[]| to |Object[]| (which would be a proxy box; the data would
> still live in the original |int[]|.) Both are intrusive into the
> |aaload| and |aastore| bytecodes and still are not anomaly-free.
> So, overall, while this seems possible, the implementation cost is very
> high, all of which is for the sake of migration, which will remain as
> legacy constraints long after the old code has been migrated.
>     Model 5
> Model 5 asks the simpler question: can we continue to interpret |LFoo;|
> as erased in legacy classfiles, but upgrade to treating |Foo<?>| as is
> expected in source code? This entails changing the compilation
> translation of |Foo<?>| from “erased foo” to |ParamType[Foo,wild]|.
> This is far less intrusive into the bytecode behavior — legacy code
> would continue to mean what it did at compile time. It does require some
> migration support for handling the fact that field and method
> descriptors have changed (but this is a problem we’re already working on
> for managing the migration of reference classes to value classes.) There
> are also some possible source incompatibilities in the face of separate
> compilation (to be quantified separately).
> Model 5 allows users to keep their |Foo<?>| and have it mean what they
> think it should mean. So we don’t need to introduce a confusing
> |Foo<any>| wildcard, but we will need a way of saying “erased Foo”,
> which might be |Foo<? extends Object>| or might be something more
> compact like |Foo<erased>|.
>     Comparison
> Comparing the three models for wildcards (2, 4, 5):
>   * Model 2 defines the source construct |Foo<?>| to permanently mean
>     |Foo<erased ref>|, even when |Foo| is anyfied, and introduces a new
>     wildcard |Foo<any>| — but maintains source and binary compatibility.
>   * Model 4 let’s us keep |Foo<?>|, and retroactively redefines bytecode
>     behavior — so an old binary can still interoperate with a reified
>     generic instance, and will think a |Foo<int>| is really a
>     |Foo<Integer>|.
>   * Model 5 redefines the /source/ meaning of |Foo<?>| to be what users
>     expect, but because we don’t reinterpret old binaries, allows some
>     source incompatibility during migration.
> I think this pretty much explores the solution space. Our choices are:
> break the user model of what |Foo<?>| means, take a probably prohibitive
> hit to distort the VM to apply new semantics to old bytecode, or accept
> some limited source incompatibility under separate compilation but
> rescue the source form that users want.
> In my opinion, the Model 5 direction offers the best balance of costs
> and benefits — while there is some short-term migration pain (in
> relatively limited cases, and can be mitigated with compiler help), in
> the long run, it gets us to the world we want without permanently
> burdening either the language (creating confusion between |Foo<?>| and
> |Foo<any>|) or the VM implementation.
> In all these cases, we still haven’t defined the semantics of /raw
> types/. Raw types existed for migration between pre-generic and generic
> code; we still have that migration problem, plus the new migration
> problems of generic to any-generic, and of pre-generic to any-generic.
> So in any case, we’re going to need to define suitable semantics for raw
> types corresponding to any-generic classes.

More information about the valhalla-spec-observers mailing list