forax at univ-mlv.fr forax at univ-mlv.fr
Thu Jan 24 00:14:22 UTC 2019

```> De: "John Rose" <john.r.rose at oracle.com>
> À: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "Brian Goetz" <brian.goetz at oracle.com>, "amber-dev"
> <amber-dev at openjdk.java.net>
> Envoyé: Jeudi 24 Janvier 2019 00:18:29

> On Jan 23, 2019, at 3:00 PM, Remi Forax < [ mailto:forax at univ-mlv.fr |
> forax at univ-mlv.fr ] > wrote:

>> the lexer is greedy so there is no ambiguity at parser level.

> Exercise for your students: Characterize the set of all strings of the form
> `A+"-"+B` which can occur (as three tokens) in some valid Java program, where
> `A` and `B` are Java keywords or identifiers, but not both are identifiers.

> Example solutions:
> `A=m,B=int` in `int m=0, n = m-int.class.hashCode()`
> `A=assert,B=m` in `assert-m>0`.

> Non-solution:
> `A=B=do` in `x=redo-done`.

I've taught compilation since 2007 but it's a nice exercise.

Calculating the set of B is easy because when you do a LL or LR analysis, it's FIRST(Expression), for the set of A, you have to do it manually because it's all the terminals that appears at the last position of an Expression :(

Browsing the ANTLR 4 Grammar (Java 8), i have:
{A} = null, true, false, this, class, new, identifier
{B} = null, true, false, boolean, short, char, int, long, float, double, void, this, new, super, identifier + switch (java 12)

As a bonus, you also have to take care of the keywords that can be started with the suffix of a numeric like f-super because it can appears in 12.7f - super.foo() by example. It is not an issue for a classical lexer but doesn't work if you use a regex only to try to match keywords.

Rémi
```