[lvti] Handling of capture variables

Brian Goetz brian.goetz at oracle.com
Mon Apr 10 17:01:03 UTC 2017

Zooming out for a minute, let me illustrate the landscape for treatment 
of non-denotable types.  There are several kinds of non-denotable types 
that inference might come up with:

  - the Null type
  - capture types
  - intersection types
  - anonymous class types
  - array types with non-denotable elements
  - parameterized class/interface types with non-denotable type arguments
  - Under JEP 301, a specialized enum constant

It's also worth mentioning that some types that are currently 
non-denotable at the source level but denotable at the classfile level, 
such as anonymous class types.

And, there are three possible treatments for when inference encounters a 
non-denotable type:
  - infer it anyway
  - reject it, make the user provide a manifest type
  - infer a suitable denotable supertype

There are pros and cons for each of these approaches.  It's also not 
necessarily the case we have to choose the same answer for each category 
of non-denotable types.

Initially, we leaned towards the "reject it" answer, because it was 
simple and because then every program using local variable type 
inference had an equivalent program where every variable was manifestly 
typed.  However, as Dan's analysis for capture types shows, this turned 
out to be a naive answer.  Not being able to say

     var x = Class.forName("com.foo.Foo");

(or any of the many other, sometimes surprising, situations where 
capture types come up, as outlined below) would go over very poorly, not 
only because it would be annoying to have to provide a manifest type, 
but because its likely that this restriction would be an unpleasant 
surprise every time it came up.  (Even experts are often surprised to 
find that certain expressions have capture variables in their type.)

Also, a uniform policy across kinds of ND types is probably not 
desirable either.  There's really no point in ever inferring the null 
type; the only thing you could assign to it would be null, which is 
rarely what the user wants.  So both the "always infer" or "always 
reject" approaches are simplistic.

For the easy cases, the answers are easy:
  - Null type: always reject
  - Anon class types / enum constants: always infer

For capture types, we've already got a reasonable tool for projecting a 
non-denotable capture type into a reasonable denotable supertype, which 
Dan will explicate (and which our implementation currently implements.)  
With this, the answer for capture types (including array and class types 
that have captures in their component/parameters) is probably "project 
to a denotable supertype."

We'll provide some details on the pros and cons of the various choices 
with intersection types (and some examples where intersection types pop 
up suprisingly.)

On 3/31/2017 7:39 PM, Dan Smith wrote:
> As described in the JSR 286 spec document, inferring the type of a local variable to be a non-denotable type (one that can't be written in source) is something to be careful about, due to "potential for confusion, bad error messages, or added exposure to bugs".
> The most significant area here (in terms of likely frequency) is the presence of capture variables in the type. I did some analysis of the Java SE APIs to identify and illustrate problematic cases.
> == Case 1: wildcard-parameterized return type ==
> Any method (or field) that returns a wildcard-parameterized type will produce a non-denotable type on invocation, because the return type must be captured (JLS 15.12.3).
> var myClass = getClass();
> var c = Class.forName("java.lang.Object");
> var sup = String.class.getSuperclass();
> var entries = new ZipFile("/etc/filename.zip").entries();
> var joiner = Collectors.joining(" - \n", "<start>", "<end>");
> var plusCollector = Collectors.reducing(BigInteger.ZERO, BigInteger::add);
> var future = Executors.newCachedThreadPool().submit(System::gc);
> void m(MethodType type) { var ret = type.returnType(); }
> void m(TreeSet<String> set) { var comparator = set.comparator(); }
> void m(Annotation ann) { var annClass = ann.annotationType(); }
> void m(ReferenceQueue<String> queue) { var stringRef = queue.poll(); }
> Using wildcards in a return type is sometimes discouraged, but other times it's the right thing to do.  So while I wouldn't say these methods are pervasive, there are quite a few of them (especially where the common idiom is to almost always use a wildcard, as in Class and Collector).
> There are no capture variables present for methods that return arrays, lists, etc., of wildcard-parameterized types, because capture doesn't touch those nested wildcards:
> void m(MethodType type) { var params = type.parameterArray(); }
> void m(MethodType type) { var params = type.parameterList(); }
> == Case 2: instance method returning a class type parameter ==
> A method (or field) whose return type is a class type parameter will produce a capture variable when invoked for a wildcard-parameterized type.
> void m(Class<? extends Runnable> c) throws Exception { var runnable = c.newInstance(); }
> void m(Map<String, ? extends Throwable> map) { var e = map.get("some.key"); }
> void m(List<? extends Set<String>> sets) { var first = sets.get(0); }
> Object find(Collection<?> coll, Object o) { for (var elt : coll) { if (elt.equals(o)) return elt; } return null; }
> void m(Optional<? extends Number> opt) { var num = opt.get(); }
> void m(IntFunction<? extends Reader> f) { var reader = f.apply(14); }
> void m(Future<? extends ZipEntry> future) { var entry = future.get(10, TimeUnit.SECONDS); }
> If you substitute a wildcard-parameterized type into the return type, that also leads to capture:
> void m(List<Set<? extends Number>> list) { var set = list.get(0); }
> This is true for for-each, too (for now, javac fails to perform capture correctly, so you don't see this in the prototype):
> void m(List<Set<? extends Number>> list) { for (var set : list) set.clear(); }
> == Method category 3: instance method returning a type that mentions a class type parameter ==
> A method (or field) whose return type *mentions* a class type parameter (e.g., Iterator<E> in Iterable.iterator) will also produce a non-denotable type when invoked for a wildcard-parameterized type.  Unlike Category 2, which tend to be "terminal operations", these types often arise in chains.
> var constructor = Class.forName("java.lang.Object").getConstructor();
> void m(Map<? extends Number, String> map) { var keys = map.keySet(); }
> void m(Map<? extends Number, String> map) { var iter = map.keySet().iterator(); }
> void m(TreeMap<String, ? extends Throwable> map) { var tail = map.subMap("b", "c"); }
> void m(TreeSet<String> set) { var reverseOrder = set.comparator().reversed(); }
> void m(List<? extends Number> list) { var unique = list.stream().distinct().sorted(); }
> void m(List<? extends Throwable> stream) { var best = stream.min(Comparator.comparing(e -> e.getStackTrace().length)); }
> void m(Function<? super String, File> f1, Function<? super File, Integer> f2) { var f = f1.andThen(f2); }
> void m(Predicate<? super File> discard) { var keep = discard.negate(); }
> == Case 4: method with inferred type parameter in return type ==
> A method (or constructor) whose return type includes an inferred type parameter may end up substituting capture variables or other non-denotable types.  This typically depends on the types of the arguments, again with a wildcard-parameterized type showing up somewhere.
> void m(Enumeration<? extends Runnable> tasks) { var list = Collections.list(tasks); }
> void m(Set<?> set) { var syncSet = Collections.synchronizedSet(set); }
> void m(Function<? super String, ? extends Throwable> f) { var es = Stream.of("a", "b", "c").map(f); }
> There are also cases here that are specified to produce capture vars but do not in javac:
> void m(List<? extends Number> ns) { var firstSet = Collections.singleton(ns.get(0)); }
> ----------------
> With that in mind, looking at our three options for dealing with capture variables:
> 1) Allow the non-denotable type
> 2) Map the type to a supertype that is denotable
> 3) Report an error
> (3) isn't viable. "You can't use 'var' with 'getClass'" is already pretty bad. Prohibiting all the uses above would be really bad.
> We've thought a lot about (1) and (2). The JEP includes this example:
> void test(List<?> l1, List<?> l2) {
>      var l3 = l1; // List<CAP> or List<?>?
>      l3 = l2; // error?
>      l3.add(l3.get(0)); // error?
> }
> On 'l3 = l2': I wouldn't say it's an important priority that all 'var' variables have a type that is convenient for future mutation. But we do expect users do be able to easily see *why* an assignment wouldn't be allowed. Unfortunately, capture variables are such a subtle thing that they're often invisible, and programmers don't even realize that they appear as an intermediate step. So, most people would see 'var l3 = l1' and expect that the type of l3 is List<?>.
> On 'l3.add(l3.get(0))': This is a cool trick. The use of 'var' essentially serves the same purpose as invoking a generic method in order to give a capture variable a name:
> <T> dupFirst(List<T> list) { list.add(list.get(0)); }
> ...
> dupFirst(l1);
> On the other hand, it's a subtle trick, and the average user isn't going to understand what's going on. (Or, more likely: 'l3.add(l3.get(0))' looks fine to them, but they won't understand why it stops working when that gets refactored to 'l1.add(l1.get(0))'.)
> So, in terms of user experience, it seems like (2) is the desired outcome here.
> That choice isn't without some sacrifice: it would be a nice property if lifting a subexpression out of an expression into its own 'var' declaration yields identical types. Since (2) changes the intermediate type, that doesn't hold. That said, hopefully our mapping function is reasonably unobtrusive...
> How do we define the mapping? "Use the bound" is the easy answer, although in practice it's more complicated than that:
> - Which bound? (upper or lower?)
> - What if the bound contains the capture var?
> - What do you do with a capture variable appearing as a (invariant) type argument?
> - What do you do with a capture variable appearing as a wildcard bound?
> We're working on finalizing the details. While this operation isn't trivial, it turns out it's pretty important: we already need it to solve bugs in the type system involving type inference [1] and lambda expressions [2]. It's a useful general-purpose tool.
> —Dan
> [1] https://bugs.openjdk.java.net/browse/JDK-8016196
> [2] https://bugs.openjdk.java.net/browse/JDK-8170887

More information about the amber-spec-experts mailing list