Project Lambda: Java Language Specification draft Version 0.1 Copyright © 2010 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved. * Introduction This document serves two purposes: design and specification. It expands on the design for lambda expressions, function types, etc, given in the first five sections of the Project Lambda: Straw-Man Proposal (http://cr.openjdk.java.net/~mr/lambda/straw-man/). It specifies those features with reference to existing JLS structure and definitions. Design and specification do not necessarily need to be unified, but we have done so because the JLS is such a rich source of terminology; and because so many of its rules (e.g. the meaning of names) should apply to lambda expressions automatically; and because it keeps the mind focused on pure language issues rather than classfile/VM/library issues. This document does not consider implementation. The mapping of lambda expressions to objects, and of function types to class or interface types, is neither designed nor specified. Even if the mapping was designed here, it is unlikely ever to be specified in the JLS. Binary compatibility for lambda expressions will eventually be specified in terms of changes to function types only. It is a goal of this document to allow the implementer freedom as to how and when lambda expressions are evaluated. This document is a draft. It provides an initial, changeable set of updates to the following JLS chapters: - Expressions - Statements - Names - Types, Values, and Variables - Conversions - Exceptions - Definite Assignment * Notation [x.y.z This is a JLS subsection to be updated] This is new normative text. This is an example, indented two spaces. This is existing normative text *and this is an extension for Lambda* and this is back to existing normative text. [[This is meta-information, typically describing a simple JLS update carried out by augmenting existing normative text with extra phrases.]] /* This is discussion. */ * Issues - Lambda expression evaluation: as an implementation detail, some lambda expressions may be safely (no side-effects) evaluated out of order of the program text. For example, a lambda expression with no free variables could be evaluated by the body of a static initializer of the class containing the lambda expression. The draft tries to take no position on when lambda expression evaluation physically occurs. - Lambda expressions as closures: There are effectively-final variables, but I am holding off shared variables for now. As background reading to why loop variables should not be shared, see http://blogs.msdn.com/ericlippert/archive/2009/11/12/closing-over-the-loop-variable-considered-harmful.aspx. Note that a lambda in C++0x requires the programmer to enumerate the variables in lexical scope accessed by the lambda expression. - Lambda instance invocation: Java has separate method and variable namespaces because a name can be syntactically classified as a MethodName or an ExpressionName (JLS3 6.5.1). If method invocation syntax is used as-is for lambda invocation, then more classification rules will be needed. This document avoids new rules by proposing a specific lambda invocation syntax. See the discussion in-line. - Lambda conversion: ambiguities can arise in overload resolution if multiple potentially applicable methods take suitable SAM types in the same argument position. Need a way to disambiguate. - Function types: better syntax is desirable, especially for higher-order types. Happily, while '#' is the first token of a function type and a lambda expression, only a 1-lookahead is needed to disambiguate. - Class literals: Can you get the class literal of a function type, e.g. #int(int).class ? We don't yet know the implementation of function types, so the Class object underlying a lambda expression is unknown. Deferred. * Expressions [15.8 Primary Expressions] Expression: LambdaExpression LambdaExpression: '#' '(' FormalParameterList_opt ')' '(' Expression_opt ')' '#' '(' FormalParameterList_opt ')' Block #()() #()(5) #()(x.m()) #()((foo++)) #()("a"+"b") #()( {1,2,3} ) // Proposed collection literal expression from Coin #(){} #(){return 5;} #(){x.m();} #(){foo++;} [15.8.3 this] The keyword this may be used only in the body of an instance method, instance initializer or constructor, or in the initializer of an instance variable of a class, *or in a lambda expression*. The type of this in a lambda expression is the function type of the lambda expression. /* Treatment of 'this' inside a lambda expression is essentially the same as 'this' inside the body of an anonymous inner class. */ class CountingSorter { int count = 0; void sort(List data) { // Field 'count' is in scope here, so the lambda expression may use it Collections.sort(data, #(String a, String b) { count++; return a.length()-b.length(); }); // Lambda instance has no field 'count', so 'this.count' is a compile-time error Collections.sort(data, #(String a, String b) { this.count++; return a.length()-b.length(); }); } } [15.8.6 Lambda Expressions] A lambda expression is used to create a new object that is a lambda instance (15.8.7). A lambda expression specifies a expression or block of code, followed by a (possibly empty) list of formal parameters to the expression or block. Evaluation of a lambda expression results in a lambda instance that is a reference value (4.3) and may be invoked (15.8.7), or an OutOfMemoryError if insufficient memory is available to allocate a lambda instance. #()(42) is a lambda expression. #(int x)(x+x) is a lambda expression. #(int x, int y) { if (x>y) return x; else return y; } is a lambda expression. /* The following is inspired by 8.4.7 Method Body. In a method body, it's easy to check a return statement with an Expression for assignment compatibility against the return type in the method signature. In a lambda expression, it's not so easy to check a return statement with an Expression because the lambda expression has an implicit type based on its body, not a method signature. Also, note that void is not a type; it indicates "lack of return value" for a method and now a lambda expression also. */ The body of a lambda expression is an expression or a block of code. If the body of a lambda expression is an expression, then the type of the body is the type of the expression. If the body of a lambda expression is a block, then either all or none of the return statements in the block must have an Expression. If no return statement has an Expression, then the body of the lambda expression is void, i.e. has no type. If all return statements have an Expression, then the types of the Expressions must be assignment-compatible with each other, or a compile-time error occurs. The type of the body is lub(T1..Tn) where T1..Tn are the types of the Expressions after boxing conversion. The type of a lambda expression is a function type #T(S1..Sm)(E1..En) where: - If the body of the lambda expression is void, then T is void, indicating no return type; otherwise, T is the type of the body of the lambda expression after capture conversion. - S1..Sm is the list, possibly empty, of types of the formal parameters of the lambda expression. #()() has type #void() #(int x)() has type #void(int) #(int x)(x*x) has type #int(int) #() { if (..) return; else return; } has type #void() #() { if (..) return 1; else return 2; } has type #int() #() { if (..) return "1"; else return 2; } has type #Integer() - E1..En is the list, possibly empty, of the checked exceptions that the body of the lambda expression can throw. /* A divergent lambda expression such as #(){throw new AssertionError();} has no return value, and its body completes abruptly by reason of being a throw with an AssertionError object. For the purpose of calculating the type of the lambda expression, the body of the lambda expression is void. Even if a return statement in a lambda expression is unreachable, it contributes a return type to the body of the lambda expression and hence helps infer the type of the lambda expression: #() { if (false) return 5; throw new Error(); } // has type #int() */ /* The following is inspired by 8.1.3 Inner Classes and Enclosing Instances. */ Any local variable, formal method parameter, or exception handler parameter used but not declared in a lambda expression must be effectively-final. A local variable, formal method parameter, or exception handler parameter is effectively-final if it is never the target of an initialization or assignment expression except where definitely unassigned. /* Since a formal method parameter and an exception handler parameter are definitely assigned before the body of their method/catch block (JLS3 16.3), they can never be the target of an assignment expression when definitely unassigned. That is, assigning to the parameter of a method or exception handler will render the parameter unfit to be closed over by a lambda expression in the method or exception handler. It is as if the parameter is implicitly final, a desirable thing. */ class C { void m(int x) { int y; y = 1; ... #()(x+y) // Legal; x and y are both effectively-final. y = 2; ... #()(x+y) // Illegal; y is not effectively-final. int z = 1; ... #()(z+1) // Legal; z is effectively-final } } It is a compile-time error to modify the value of an effectively-final variable in the body of a lambda expression. [15.8.7 Lambda instance invocation] A lambda instance invocation is used to invoke the body of the lambda expression whose evaluation resulted in the lambda instance serving as receiver of the invocation. StatementExpression: LambdaInvocationExpression PrimaryNoNewArray: LambdaInvocationExpression LambdaInvocationExpression LambdaExpression LambdaInvocation PrimaryNoNewArray LambdaInvocation LambdaInvocation: '!' '(' ArgumentList_opt ')' '.' '(' ArgumentList_opt ')' #()(42)!() // Evaluates to the value 42. #int() fortyTwo = #()(42); assert fortyTwo!() == 42; #int(int) doubler = #(int x)(x + x); assert doubler!(fortyTwo!()) == 84; /* It's reasonable for a method and a variable to have the same name. It's reasonable for a method and a variable to have the same name AND same effective type (method's signature / variable's function type). But I think it's not reasonable for a method and a variable to have the same name AND effective type AND invocation syntax. If that's allowed, then obscuring rules will be needed in JLS 6.5 to favor method invocation over lambda invocation, or vice-versa. An expression that looks like a method invocation in one context could be a lambda invocation in another; moving the expression could silently change its meaning. A programmer would always have to check for a method in scope if a function-typed variable seems to be lambda-invoked, or for a function-typed variable in scope if a method seems to be invoked. Shadowing already makes an expression's meaning be context-dependent, but shadowing is both rare and easy to detect; obscuring between a method and a function-typed variable would be more common and harder to detect. Harder because it involves reasoning about inheritance: of methods if a function-typed variable seems to be invoked, and of function-typed fields if a method seems to be invoked. It would be nice to say that a function-typed variable should be preferred over a method with the same name and effective type, on the grounds that the method can be invoked with Identifier.m(..) or this.m(..). Unfortunately, one member or the other will still be obscured in this case: class C { #int(int) f = #(int x)(x); int f(int x) { return x; } } Specifying that every function type has an invoke() method is plausible but concision is moderately important for lambdas, so it's not my preferred option. */ The type of the receiver must be a function type, or a compile-time error occurs. Because 'this' has function type, it may be used as a receiver: #int(int) factorial = #(int i)(i == 0 ? 1 : i * this!(i - 1)); At runtime, lambda invocation first evaluates the receiver, then the argument expressions (if any), as per 15.12.4.2, then creates an activation frame and transfer control to the body of the lambda instance, similar to 15.12.4.5. * Statements [14.15 The break Statement] It is a compile-time error if a break statement in the body of a lambda expression attempts to transfer control to a target outside the body of the lambda expression. [14.16 The continue statement] It is a compile-time error if a continue statement in the body of a lambda expression attempts to transfer control to a target outside the body of the lambda expression. [14.17 The return statement] A return statement in the body of a lambda expression attempts to transfer control to the invoker of the lambda instance created for the lambda expression. If the return statement has an Expression, the value that results from evaluating the Expression becomes the value of the lambda instance invocation. [14.21 Unreachable statements] The block that is the body of a constructor, method, *lambda expression*, instance initializer or static initializer is reachable. * Names [8.4.1 Formal Parameters] [[Replace all occurrences of "method" with "method or lambda expression". This sets up the scope of a lambda expression's formal parameters correctly, and allows variadic lambda expressions. Copy from 8.4.1 to 6.3: "The scope of a parameter of a method *or lambda expression* or constructor is the entire body of the method *or lambda expression* or constructor."]] * Types, Values, and Variables [4.3 Reference Types and Values] There are *four* kinds of reference types: class types, interface types, array types, *and function types*. *Class types, interface types, and array types* may be parameterized with type arguments. ReferenceType: FunctionType FunctionType: '#' ResultType '(' [Type] ')' FunctionThrows_opt FunctionThrows: '(' 'throws' ExceptionTypeList ')' ExceptionTypeList: Identifier ExceptionTypeList '|' Identifier The notation #T(S1..Sm)(E1..En) indicates a function type with return type T, formal parameter types S1, S2, ..., Sm, and checked exception types E1, E2, ..., En. 'void' may be used in a function type to indicate that the body of a lambda expression has no return value. This occurs if the body of the lambda expression is a block that can either a) complete normally or b) complete abruptly by reason other than being a return with value V. #void() #int() #int(int,int) ##int()(#int(),#int()) // Function that takes two int-returning functions, and returns an int-returning function. #int()(throws IOException) #void(int)(throws NoSuchFieldException|NoSuchMethodException) [4.3.1 Objects] An object is a class instance or an array *or a lambda instance*. A lambda instance is the result of the evaluation of a lambda expression. [4.3.2 The Class Object] The class Object is a superclass of all other classes. A variable of type Object can hold a reference to the null reference or to any object, whether it is an instance of a class or an array *or a lambda expression*. [4.10.4 Subtyping among Function Types] #T(S1..Sm)(E1..En) is a direct supertype of #V(U1..Um)(F1..Fo) iff all of the following hold: - T is a supertype of V. - for i in 1..m: Ui is a supertype of Si. - for j in 1..o: There exists a k in 1..n such that Ek is a supertype of Fj. #Object(String,Integer) is a supertype of #Package(Object,Number). #Object(Object,Object) is also a supertype of #Package(Object,Number). #Object(Object[]) is a supertype of #Object[](Object). Object is a direct supertype of any function type. A function type that is void (i.e. has no return type) and has formal parameter types P1..Pn is a supertype of a function type #T(S1..Sn) iff Si is a supertype of Pi (i in 1..n). #void() is a supertype of #int(). #void(String,Integer) is a supertype of #Package(Object,Number). [4.12.2 Variables] A variable of function type #T(S1..Sn) can hold a null reference or a reference to any lambda instance whose type is a subtype of #T(S1..Sn). [4.12.6 Types, Classes, and Interfaces] [[Needs to be updated w.r.t. "Every object belongs to some particular class." Minimal information about the classes of lambda expressions is probably necessary.]] * Conversions [5.1.14 Lambda Conversion] A SAM (Single Abstract Method) type is a top-level class or interface type with abstract member methods M1..Mn (n>=1) such that if Mi does not have the same signature as any member method declared in Object, and Mj does not have the same signature as any member method declared in Object, then i==j. The single abstract member method of a SAM type that does not have the same signature as any member method declared in Object is called the target abstract method. The signature, return type, and throws clause of the target abstract method are called its descriptor. If a SAM type is a class type, then it must be declared abstract and have a no-args constructor. /* The above definition excludes member methods of Object which are explicitly declared but which would otherwise be implicitly declared. This is to allow a conversion to an interface type like Comparator, which has multiple methods but really only one is "new"; the other method is an explicit declaration of a method which would otherwise be implicitly declared. The above definition deliberately does not treat multiple non-Object abstract methods with compatible signatures as if they represented a single abstract method. This reflects existing practice whereby if an interface or abstract class has multiple such members, it is not possible for a non-abstract class to implement the interface/extend the abstract class simply by providing a single concrete method. */ A lambda conversion exists from a function type #T(S1..Sm)(E1..En) to the descriptor of the target abstract method M of a SAM type, provided that all of the following hold: - If T is not void, then T can be converted to the return type of M by assignment conversion. - If T is void, then M is void or has return type java.lang.Void. - M is not generic and has m formal parameters. - For i in 1..m, the i'th formal parameter of M has type Si. - For j in 1..n, the checked exception type Ej is a subtype of some exception type in the throws clause of M. // This example shows formal parameter types of a function type and a target method being matched. class Foo { #int(T) p = #(T x) ( x.someOpReturningInt() ); interface Bar { int m(T x); } Bar b1 = p; // Illegal; Bar has member m with effective signature #int(String) while p has signature #int(T). Bar b2 = p; // Legal; Bar has member m with effective signature #int(T), just like p does. } // This example shows why the target abstract method must not be generic. class Foo { #int(T) p = #(T x) ( x.someOpReturningInt() ); interface Bar { int m(T x); } Bar b = p; // Illegal. Bar has an infinite number of members called m. // The formal parameter type of b.m depends on how it is invoked, i.e. on the types of the actual parameters. // Sadly p's non-generic function type #int(...) is specific in the types of actual parameters it can accept. // Note that p's type #int(T) may be called "parameterized", since it uses a type parameter, // but it is not "generic", since it does not introduce its own type parameter. } // This examples shows (via a counter-example) why an abstract class must have a no-args constructor. abstract class Foo { Foo(int x) {} int m() {..} } class FooImpl extends Foo { FooImpl(int x) { super(x); } int m() {..} } #int() x = #()(42); Foo f = new FooImpl(); // Illegal, of course. Foo f = x; // x has no argument for Foo's constructor, so must be illegal like the FooImpl() line. /* A lambda conversion from a function type to a descriptor is not a kind of widening reference conversion. The type of a lambda expression is not a subtype of any interface type including any SAM type. There is no inheritance of a SAM type's members by a lambda expression which is converted to a SAM type's target method. Thus, the members of a SAM type are not in scope (i.e. cannot be accessed by simple names) in the body of the lambda expression. This makes lambda expressions less powerful than anonymous inner classes. Given: void m(TimerTask tt) {..} an anonymous inner class instance passed to m() would necessarily be a subtype of TimerTask: m( new TimerTask() { public void run() {..} } ); so method bodies in the anonymous inner class have TimerTask members in scope by inheritance. Therefore, it is convenient for the body of a lambda expression to have access to members of the SAM type. To achieve this, I am thinking that 'this' in the body of the lambda expression may be cast to the SAM type: // Given: void m(TimerTask tt) { ... } // Illegal; lambda instance has no 'cancel' method m( #() { this.cancel(); } ); // Illegal if there is no 'cancel' method in scope at the call site. // Legal if there is. m( #() { cancel(); } ); // Legal m( #() { ((TimerTask)this).cancel(); } ); */ [5.2] Assignment Conversion Assignment contexts allow the use of one of the following: - a lambda conversion (5.1.14). [5.3] Method Invocation Conversion Method invocation contexts allow the use of one of the following: - a lambda conversion (5.1.14). [5.5] Casting Conversion Casting contexts allow the use of: - a lambda conversion (5.1.14). * Exceptions [11.2.1 Exception Analysis of Expressions] A lambda invocation expression on a lambda expression of type #T(S1..Sm)(X1..Xn) can throw an exception type E iff either: - some expression of the argument list can throw E, or - there exists an i in 1..n such that Xi is E. [11.2.2 Exception Analysis of Statements] A method or constructor body, or a lambda expression whose body is a block, can throw an exception type E iff some statement in the body can throw an exception type E. * Definite Assignment [16.3 Definite Assignment and Parameters] A formal parameter V of a method or constructor *or lambda expression* is definitely assigned (and moreover is not definitely unassigned) before the body of the method or constructor *or lambda expression*. [16.10 Definite Assignment and Lambda Expressions] A variable V is [un]assigned after a lambda expression iff it is [un]assigned before the lambda expression. A variable V used in the body of a lambda expression but declared outside the body of the lambda expression is: - definitely assigned before the body of the lambda expression iff it is definitely assigned before the lambda expression; and - not definitely unassigned before the body of the lambda expression.