[LW100] Specialized generics -- user model issues

Brian Goetz brian.goetz at oracle.com
Wed Oct 17 20:55:13 UTC 2018

I spent some time trawling through several years of Valhalla e-mails to 
excavate the user-model issues inherent in specialized generics. I won’t 
address translation, VM support for template classes, or binary 
migration compatibility issues in this mail — this is focused on user 
model and source compatibility of migrating existing libraries.

        Type variables

In erased generics, all type variables are assumed to be erased, and 
instantiations are assumed to be subtypes of |Object|. This means that 
the following idioms are fair game in erased generic code (where |t| is 
of type |T|):

  * t = null
  * t == null
  * t == anyReferenceType
  * Object o = t
  * Object[] os = arrayOfT
  * synchronized (t) { … }

Since not all of these will be supported by value types, type variables 
need to be restricted if we intend to generify over values (and 
primitives). Our working model is to qualify such type variables with an 
|any| modifier, which triggers stronger type checking.

*Comparison with null.* We had previously rejected comparisons between 
an any-var (avar) and |null|. However, this is unlikely to be good 
enough; it seems entirely reasonable for a T-accepting method to reject 
null inputs:

|void accept(T t) { if (t == null) throw new NPE(); // do stuff } |

A preferable solution is to allow comparison to null, and at 
specialization time, constant-fold the calculation to false when |T| is 
a (non-nullable) value type.

*Assignment to null.* Assigning |null| to an avar is something we can 
generally reject in any-generic code, though we’ve recently started 
discussing how we might treat some value types as nullable (using 
zero-field detection). In this case, not only might we want to support 
this when we know T must be nullable. (This could be managed by a type 
constraint |<T extends Nullable>|, where all reference types are 
nullable, as well as all nullable value types.) For clients that just 
want a zero-ish sentinel, they can use |T.default| instead of |null|, 
which evaluates to |null| for erased |T|.

*Comparison to |Object|.* There’s a larger discussion to be had on |==| 
on values; this is part of it.

*Assignment to |Object|.* As with primitives, this is a boxing + 
widening reference conversion.

*Array covariance.* This is a larger discussion.

*Synchronization*. This can be rejected at compile time, or we can make 
heroic attempts to have this work in the VM. This is a larger discussion.

To the extent that various dependent operations are implemented one way 
for values and another for references, the compiler and template class 
mechanism will need to conspire to accept a bytecode representation that 
specializes down to the right thing.

We had been assuming that for compatibility, we would specialize value 
and primitive instantiations but leave reference instantiations erased. 
However, this choice (and how to override it) is open to discussion, as 
the specialization mechanisms we prototyped seem like they would be up 
for specializing on reference types too.


In Q-world, value types did not derive from |Object|, and could not be 
used as the type operand of |instanceof|. This meant that the usual 
|equals()| idiom:

|if (other instanceof Me) { Me o = (Me) other; return this.x == o.x && 
this.y == o.y; } else return false; |

could not be used. Instead, the signature of |equals()| was tightened to 
take the value type itself. This caused all sorts of downstream pain. 
But, now that there exists a boxing+widening conversion from any value 
type to |Object|, and we can support value types in |instanceof|, we are 
in better shape; value types can derive from |Object|, and implement 
|equals(Object)| in the usual way. As we’ll see, this eliminates a lot 
of pain.


A number of methods in Collection (|contains|, |remove|, etc) take 
|Object| instead of |T|. (This allows us to ask questions like 
|dogs.contains(animal)|. One could argue that this is not necessary, but 
its the collections API we have.)

In Q-world, these methods were problematic, because they would 
eventually bottom out in |Object::equals|:

|boolean contains(Object o) { for (T t : elements) return t.equals(o); 
return false; } |

But, if |V::equals| took a |V|, this code would not compile. Our 
previous-best (not necessarily good) strategy for this was a complicated 
dependent type that amounted to /(erased T ? Object : T)/, which would 
let the signature of reified collections be |contains(T)| and erased 
collections be |contains(Object)|.

Having healed the rift by allowing value classes to extend |Object| (or 
close enough), this problem goes away; the existing signatures and 
implementations are good enough.


The problem that |equals()| imposes on |contains()| is further kicked up 
the road to |containsAll()|, which accepts a |Collection<?>| rather than 
a more narrowly typed collection (for the same reason: 
|dogs.containsAll(animals)|.) And, having addressed |contains(Object)|, 
we’re in good shape for |containsAll|:

|boolean containsAll(Collection<?> c) { for (Object o : c) if 
(!contains(o)) return false; return true; } |

This assumes that a wildcard instantiation of an any-generic class 
refers to any instantiation, not just an erased instantiation (which is 
essentially a forced move if we don’t want users to hate us.)

        Implementing Object::equals

Consider this erased generic class:

|class Box<T> { T t; public Box(T t) { this.t = t; } public boolean 
equals(Object other) { return (other instanceof Box b) // pattern 
matching FTW! && Objects.equals(t, b.t); } |

This code is OK, but it is somewhat unsatisfying, because we’re not 
comparing that the two boxes hold the same type. This might be OK for 
|Box| — because we’re going to call |equals()| on the held |T| value — 
but doesn’t work as well for |List|. If I have an erased generic |List|, 
then this implementation of |equals()| (which is what |List::equals| 
does) would say that an empty |List<Integer>| is equal to an empty 
|List<String>|. This is kind of unsatisfying. Its the best we can do 
when the types are erased, but it’s not the best we can do for a 
specialized |List|. Unfortunately, we would like to use the same code 
for both.

But, it gets worse. If I’m an erased |Box|, then casting to |Box| or 
|Box<?>| is the best I can do, and we don’t want to lie to the user and 
let them ask |instanceof Box<T>| when we can’t really answer that 
question. But if I’m a specialized |Box|, I’d like to check against and 
cast to |Box<T>|, because I would like to take a |T| out of the box, and 
not just |Object|.

We could punt here, and say this is the cost of possibly-erased 
generics. And this might be OK. Our previous best (but not necessarily 
good) idea was to introduce a dependent type |T.OR_ERASED|, which would 
basically mean /(erased T ? erasure(T) : T)/. Then we could ask

|if (other instanceof Box<T.OR_ERASED>) Box<T.OR_ERASED> o = 
(Box<T.OR_ERASED>) other; T t = o.get(); ... } |

which is exactly the question we want to ask. It also turns out to be 
how we’re likely to represent |Box<T>| in the class file. But, it may be 
too much to ask of users. (Though, we can give them the choice; users 
who are willing to let an empty |List<int>| be equal to an empty 
|List<String>| can go the old route, and users who want finer control 
can use the new locution.)


List contains the following methods:

|boolean remove(Object object); E remove(int index); |

The former is inherited from |Collection|; the latter means “remove the 
element at the given index.” The authors of this method thought there 
was no way this could be an overload conflict, and so called this method 
|remove| rather than the more verbose |removeAt|. But, if we allow the 
instantiation |List<int>|, now these methods become ambiguous.

Our story here was some sort of conditional-member trick, where we said 
something like:

  * |remove(int)| is only defined on reference instantiations of |List|
  * |List| acquires a new method, |removeAt(int)|.
  * On reference instantiations of |List|, |removeAt(int)| has a default
    that delegates to |remove(int)|.
  * On other instantiations of |List|, |removeAt(int)| is abstract.

This allowed existing clients to work, because all existing 
instantiations of |List| are typed as some sort of erased |List|. (If a 
client anyfied their use of |List|, then its reasonable to ask them to 
change their code.) It also allows existing subclasses to work, as these 
are all erased. New clients and subclasses would have to use and declare 
the new |removeAt(int)| method.

The leading syntactic choice for declaring conditional methods is to use 
|this| parameters:

|E remove(List<ref T> this, int index); E removeAt(int index); default 
removeAt(List<ref T> this, int index) { return remove(index); } |

Note that to get this right, we needed to declare |removeAt(int)| twice, 
once as a total interface method, and once as a partial default.


|Map::get| uses |null| to signal that the specified key was not found in 
the map. For (non-nullable) value types, this is fundamentally a 
misguided method signature. (In general, an unconditional get method is 
ill-suited to types that have no available sentinel value, such as 
|int|. Types that have spare bit patterns, such as |LocalDate|, could 
still work in this model. Note that this pushes on the so-far implicit 
linkage between erasure, reference instantiation, and nullability, which 
may need to be revisited.)

One approach is to treat |Map::get| as being the same sort of pariah 
that |List::remove(int)| is; we’ll grudgingly welcome it in erased 
company, but as our types become more refined, we erase our memory of 
our erstwhile coarser companions:

|V get(Map<K, ref V> this, K key); |

Of course, we have to add some new methods to replace this. This could 
be an |Optional|-returning |get()|, or a |get()| that takes a 
user-supplied sentinel, or a pattern with a binding variable. This is an 
open libraries question.

An alternate route is to recognize that there’s a projection from every 
type to a closely-related nullable type. If we give every value type a 
companion box type, and have a way to denote it, we can totalize this 
notation trivially, and declare |get| as:

|V.Box get(K k) |

For erased instantiations, |V.Box| is just |V|. In L-world, I like this 
solution as the boxing conversion is so cheap, that attempting to avoid 
boxing doesn’t seem worth it. (Which is what winning looks like!)


The return type of |Collection::toArray| is |Object[]|, rather than 
|T[]|. This is largely a forced move for erased collections (otherwise 
we’d risk heap pollution without issuing unchecked warnings), but for a 
specialized collection, we can (and want to) do better.

As with |Map::get|, there are a few paths. We could restrict |toArray()| 
to erased instantiations, and expose a new method, such as 
|toArray(factory)|, as |Stream| does now (and which |Collection| is 
likely to get soon.) This is doable, but kind of sucks as the user 
shouldn’t have to provide |T[]::new| as a factory for a reified 
collection, since the collection presumably already knows it is a 
collection of |T|.

Alternately, we could make another trip to the type operator well, and 
define |T.ARRAY| as /(erased T ? Object[] : T[])/. (This is a close 
relative of the type operator we floated for |equals()|, and they could 
probably be merged.) Then we redefine

|T.ARRAY toArray() |

which erases to |Object[]| for erased instantiations, making existing 
clients and subclasses happy.

        9-way method sets

There are a number of method overload sets (such as 
|StringBuilder::append| or |Arrays::fill|) that have overloads for all 
the types (eight primitives, and one generic or Object version.) In 
Q-World, we had trouble specializing code that called these methods, 
because there was no applicable method for value instantiations. (For 
some of these (|Arrays::fill|), we’d like to turn these into a single 
specializable method, with bridges for the existing 
hand-specializations.) Our troubles here in Q-World stemmed from the 
fact that we were not able to conclude that these overload groups could 
be effectively unified into a total signature, so we could not call 
these methods from any-generic code. This is an open issue involving 
membership computation (can the compiler assume there is a total member 
if it finds a covering set of overloads?) and template classfile design 
(can we encode an invocation of such a method in a template classfile.)

Separately, we either need to decide to lump the value versions of these 
in with |Object| (which works OK with |m(Object)| but not with 
|m(Object[])| unless we plunk for covariance), or to have a |Value| 
super type and let users code an |m(Value)| version.

        Species statics

In migrating generic code, it is not uncommon to use erasure as a way of 
sleazily sharing an instance. For example, this idiom is common:

|private static final Collection<?> EMPTY = new EmptyCollection(); 
@SuppressWarning(“unchecked”) public static<T> Collection<T> 
emptyCollection() { return (Collection<T>) EMPTY); } |

This is OK in erased code, but a no-go in specialized code. But, users 
will be reluctant to give up such idioms. We concluded the answer is 
that some statics are static to the /class/, and others care static to 
the /species/, and the above idiom is an example of the latter. (The 
type variables are in scope in a species context, but not a class context.)


I saved the worst for last. In the language type system, |Foo<?>| is a 
super type of every instantiation of |Foo|. Conveniently, we erase 
|Foo<T>|, |Foo<?>|, and raw |Foo| to the same thing — |LFoo;|, and 
existing classfiles will use this to denote both “erased Foo” and “any 
Foo”. It is likely we’re going to have to pull some serious magic to 
compatibly retain this illusion. But, we’re talking about user model, 
not migration compatibility. Which means that we need a way to write 
down a wildcard instantiation, and the VM needs to know that every 
species is a subtype of the wildcard.

There are multiple wildcard instantiations, since there could be 
multiple type variables. If we have |Foo<T,U>|, we could have 
instantiations |Foo<?,String>| and |Foo<String,?>| and |Foo<?,?>|, with:

|Foo<String,String> <: Foo<String,?> <: Foo<?, ?> Foo<String,String> <: 
Foo<?,String> <: Foo<?, ?> |

We could assign each wildcard type its own runtime type, or we could 
collapse them to a single all-wild type. The tradeoff here is boxing vs 
type proliferation; if we give each wildcard its own runtime type, we 
get sharper type signatures (|Map<?,int>| is known to return |int|, 
rather than erasing to |Object|) but more types and more bridges, but 
less boxing. (Given that boxing just got a lot cheaper, that gives us a 
nudge towards the simpler solution.)

    Summary so far

The story so far is much better than it was in Q-World! Some of the 
problems (such as |equals(Object)|) from Q-World have just gone away, 
and others are amenable to techniques we have already identified. 
Candidate pieces of the solution include:

  * Template classfiles
  * Species statics
  * Partial / conditional members
  * |T.default|
  * |T.BOX|
  * |T.ARRAY|

Open issues include:

  * Array covariance
  * Equality comparisons
  * Locking
  * Whether “erased” and “reference instantiation” are coupled
  * Whether we can get away with projected types like |T.OR_ERASED|
  * A top value type
  * Representation of wildcards

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/attachments/20181017/01d52b42/attachment-0001.html>

More information about the valhalla-spec-experts mailing list