From jrl at staffmail.ed.ac.uk Wed Dec 3 03:18:43 2008 From: jrl at staffmail.ed.ac.uk (John Longley) Date: Wed, 03 Dec 2008 11:18:43 +0000 Subject: Static analysis of non-local transfers: a proposal In-Reply-To: <5F2139C8-1413-422E-9A9D-356F97929E25@twistedbanana.demon.co.uk> References: <20081120172753.vphjeep04kosco4k@www.staffmail.ed.ac.uk> <15e8b9d20811221546v32e3f028nc245a4be0b2ac0e9@mail.gmail.com> <9D638248-E45A-43AD-9490-554BFF9936BC@twistedbanana.demon.co.uk> <20081124125821.jngv8pihfk40040g@www.staffmail.ed.ac.uk> <5F2139C8-1413-422E-9A9D-356F97929E25@twistedbanana.demon.co.uk> Message-ID: <20081203111843.zh2ctpjt44gwwso0@www.staffmail.ed.ac.uk> Hi again, Thanks to Mark for his very helpful email on the concurrency issues. As promised, here are some ideas on a simplified scheme for exception tracking which becomes possible if the restrictions outlined in my previous emails are adopted. The headline is that the use of exception type parameters then becomes superfluous, as one can give a naturally exception-transparent system that doesn't require them. In this longish message, I'll describe... (A) a way of setting this up while staying as close as possible to the existing proposal (although still relying on a distinction between function and interface types). (B) an alternative system which seems simpler and cleaner overall (though is admittedly slightly coarser in its analysis of exceptions). (C) some further general thoughts (some rather obvious) on the issue of function/interface type compatibility, expanding on a few of the points in previous messages. (A). The idea of "exception transparency", as I understand it, is that one would like a system in which one could pass a function F with throwables E to a method or function G that may not know about E, but which may have its own throwables E' - and the type system would infer that the throwables of G(F) were E union E'. In the present scheme (v0.5), the formal parameter f of G has to be tagged as possibly raising an arbitrary exception by means of an exception type parameter E, so as to prevent things like: class Bad { { => void} x ; void store ({ => void f}) { x = f ; } void run () { x.invoke() } ; } Bad o = new Bad() ; { class MyException extends Exception ; o.store { => throw new MyException() ;} } o.run() ; // MyException thrown outside its scope! (Note: even if exception type parameters are elided, as suggested in the "Open Issues" document, they would presumably need to be added in for typechecking purposes in order to exclude examples like the above.) However, under my proposed restrictions on the use of store at function types, the above class Bad would be excluded anyway, because it attempts to allow the dynamic behaviour of f to be retained by the longer-lived variable x. In other words, exactly the same restrictions on "flow of dynamic behaviour" which prevent escaping "returns" also prevents unruly exception-raising behaviour. Of course, one might think these restrictions are themselves rather severe, but here my response would be: if it's unrestricted higher-order store you want, you can use objects, not closures. (One can even imagine exception type parameters being added for the sake of allowing exception-transparency when working with interface types - but whether or not one did this, they wouldn't be needed for function types.) To summarize so far, my suggestion would be that if F is an expression of function type, a method application G(F) is allowed whether or not the formal parameter of G explicitly mentions the throwables of F, and the type system knows that the throwables of G(F) are those of G plus those of F. The point is that if F might throw e, then the evaluation of G(F) itself might throw e, but after G has returned, no further appearances of e can arise, since the exception-raising potential of F does not leak into e.g. the fields of the target object. Similarly, if G is itself a higher-order function, G.invoke(F) can be allowed whether or not the argument type of G mentions the throwables of F; again, the throwables of G.invoke(F) are those of G plus those of F. All this works nicely for methods/functions whose return type is not itself a function type. If the return type is a function type (e.g. as in curried functions), some further issues arise, and here my system is admittedly a little crude. In the existing system with exception type variables, it is possible to distinguish among the following three scenarios by appropriately careful placement of the exception type variable E: * the evaluation of G(F) might throw e, but if it doesn't, later invocation of the resulting closure won't throw e. * the evaluation of G(F) won't throw e, but later invocation of the resulting closure might. * either the evaluation or the later invocation might throw e. Under (the most straightforward version of) my proposal, these distinctions can't be made, as the uses of the formal parameter f aren't "tracked" by the type system. Being somewhat conservative, the easiest thing is just to say that the relevant set E union E' must be used both as the throwable set of the expression G(F) and as the "throws" annotation in the type of the resulting function. One can of course do better than this by a more careful analysis of the body of G - in a way, this could be seen as using exception type variables without admitting it ;-) This is a little bit unpleasant - and it obviously gets worse for curried functions with more arguments - but the question is whether the extra control one gets from exception type variables is sufficiently useful in practice to justify the extra machinery that they involve (as well as the cluttering of code, if they're not elided). (B). Although something like the above might be a good option in the context of BGGA, I should mention that a more radical simplication is also possible, in which the coarseness of the above proposal is (so to speak) made systematic. Here the idea is that we don't annotate each arrow appearing in a function type with its own set of exceptions; rather, we annotate the type as a whole with a single set at top-level: Type ::= Type1 Throws_opt Type1 ::= StandardJavaType | { Types1_opt => Type1 } Types1 ::= Type1 | Type1 , Types1 with the proviso that for a standard Java type, the throws clause should be absent. (Thus, the treatment of exceptions for standard Java types is unaffected - only function types are treated in this way.) The assertion "F has type (T throws e_1,...,e_n)" should then be interpreted as saying that the set e_1,...,e_n includes all exceptions that may be either raised by F immediately or squeezed out of it by some later invocation. Of course, this means that various distinctions of the above kind are no longer possible for curried functions - on the other hand, the gain in simplicity may well have its own appeal in the context of an already complex language! Further details of this idea, including a type system worked out for a fragment of ML, can be found in an unpublished draft document available at http://homepages.inf.ed.ac.uk/jrl/Research/exceptions.pdf (Only the system $\EE_1$ described in this document is relevant here.) I currently have a PhD student developing these ideas further (although the focus of his work is more on the denotational semantics for such systems). (C). Finally, a few more thoughts on function/interface type compatibility. (1) I said in an earlier message that for the purpose of static analysis, function and interface types could be distinguished at an early stage and then identified. Having thought about it a bit more, it seems that this should actually be quite easy to do. After parsing, only a rather crude analysis (far less than full typechecking) is needed to identify which variables and (sub)expressions are of function type, which is really all that is needed to verify my conditions. However, I can't resist adding that if function and interface types were distinguished at all at *any* level, I think it would be much better if a separate syntax for function application was provided, rather than e.g. the ".invoke()" form which cries out to be read as a method invocation. (2) If one wishes to guarantee "no late transfers" (in the sequential setting), then as outlined in my second message, a condition on implicitly type-converted expressions is needed. This condition is similar to (actually slightly more liberal than) the one currently adopted for conversion to RestrictedFunction interfaces. So a "hard line" approach would be to forbid all such conversions that don't satisfy the conditions. I imagine, though, that a softer line will seem more attractive: e.g. allow all type-conversions in the interests of retrofitting etc., but issue a compiler warning if they don't conform to the restrictions. That way, if the compiler is able to discover that a program is safe against late transfers, it will share this welcome knowledge with the programmer via the absence of such warning. (3) That all seems very nice if exceptions are left out of the picture. But if one wished to adopt (either version of) my simpler scheme for exception tracking, a little more care is needed over these conversions. Specifically, in the above examples, occurrences of the parameter f within the body of G mustn't be convertible to interface types - or at least not without tagging them somehow as potentially exception-raising - otherwise the whole system becomes exception-unsafe. Various solutions are possible here, but to help me to understand what kinds of type-compatibility are hoped for, I'd like to ask the following question. The current spec describes a fairly generous scheme of "closure conversion" which applies just to closure *literals* (not e.g. to variables). This is all very nice and fits happily with the "soft line" scheme suggested above, plus (probably, with a little care) the simplified treatment of exceptions. So my question is: how important is it that *in addition to* this conversion scheme, closure types are deemed identical (for programming purposes) to certain (synthetic) interface types, so that arbitrary expressions of closure type are automatically of interface type? Are there use cases for which this is important? I appreciate that for compatibility amongst compilers, one wants to require that the types look the same at the back-end - but need they also be the same to the programmer? As I say, even within the present system there are probably various things one might do. But I am really trying to argue that the advantages of distinguishing function and interface types might considerably outweigh the disadvantages ;>) Closures and objects do offer different things, and drawing a clear distinction between them would allow one to exploit these differences to the full! Best wishes, John -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.