Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch)

Brian Goetz brian.goetz at oracle.com
Wed May 22 15:45:47 UTC 2019

We’ve been drilling into the spec and implementation of yield as a contextual keyword.  We have three possible strategies, all of which are specifiable and implementable, but with tradeoffs.  

The “dumb strategy” would be to say that `yield` is a keyword when it appears in the first position of a statement production (e.g., after an open brace or a semicolon.). This is simple to spec, and simple to implement, but it doesn’t so do well with variables named `yield`:

    yield = 3;
    if (foo)
        yield += 3;
    yield[3] = 4;

The “smart strategy” says that `yield` is a keyword only within the context of the YieldStatement production; the rest of the time it is an identifier.  This is also simple to spec, and does the right thing in all unambiguous cases, but requires unbounded lookahead, which compiler implementations may not like.  The one ambiguous case is 


which would match both YieldStatement and ExpressionStatement, and here we bias towards YieldStatement.  Naked yield() invocations can qualify the invocation:


The “compromise” strategy is like the smart strategy, except that it trades fixed lookahead for missing a few more method invocation cases.  Here, we look at the tokens that follow the identifier yield, and use those to determine whether to classify yield as a keyword or identifier.  (We’d choose identifier if it is an assignment op (=, +=, etc), left-bracket, dot, and a few others, plus a few two-token sequences (e.g., ++ and then semicolon), which is lookahead(2).  

The main difference between the compromise strategy and the smart strategy is the handling of method invocations that are not unary:

    yield(3, 4)

In the smart strategy, we’d figure out that this is a method call; in the compromise strategy, we’d require qualification just as we do with the unary method.  

The compromise strategy misses some cases we could parse unambiguously, but also offers a simpler user model: always qualify invocations of methods called yield when used as expression statements.  And it offers the better lookup behavior, which will make life easier for IDEs.  

So my recommendation here is the compromise strategy.  

> On May 21, 2019, at 10:50 AM, Tagir Valeev <amaembo at gmail.com> wrote:
> I discussed this with colleagues and can confirm that for IntelliJ
> IDEA parser it will be no problem to always consider yield as a
> statement. At least it's much easier than to consider it as a
> statement inside switchy blocks only.
> With best regards,
> Tagir Valeev.
> On Tue, May 21, 2019 at 12:38 PM Tagir Valeev <amaembo at gmail.com> wrote:
>>> So does this (option B plus your No) mean that IDEs would tend to color "yield" as a keyword (at the beginning of a statement) even if followed by "("?
>> My "No" was mostly against options C and D where symbol resolution
>> affects the parse tree. Sorry if it wasn't clear from my message. When
>> the context for the parsing is available inside the same Java file,
>> it's usually ok. See the 'var' restricted keyword:
>> var var = 10; // first is highlighted as type, second as local variable
>> var = 20; // var is highlighted as local variable, despite it's at the
>> beginning of a statement.
>> var(1); // var is highlighted as a method call, despite it's at the
>> beginning of a statement.
>> We have no very big problems parsing this.
>> With best regards,
>> Tagir Valeev.
>> On Tue, May 21, 2019 at 2:52 AM John Rose <john.r.rose at oracle.com> wrote:
>>> On May 20, 2019, at 8:24 AM, Tagir Valeev <amaembo at gmail.com> wrote:
>>>> Assuming that we agreed on 'yield' the option B seems the most attractive. A big No to context-specific parse tree. It's a complete pain to IDEs. Don't forget that IDE often deals with incomplete code, missing dependencies, etc., and still needs to provide reasonable highlighting and completion. Imagine that 'yield' method is available via import static Foo.* or superclass. In this case we don't want to look into other files to build a correct parse tree.
>>> So does this (option B plus your No) mean that IDEs would
>>> tend to color "yield" as a keyword (at the beginning of a
>>> statement) even if followed by "("?
>>> I suppose that would work.  It's hard to predict what that
>>> would feel like, but it's logical.
>>> — John

More information about the amber-spec-experts mailing list