Next up for patterns: type patterns in switch

Brian Goetz brian.goetz at
Thu Jul 23 16:20:01 UTC 2020

In case I wasn't clear, this was a proposal to proceed on _right now_.  
There's been no comments so far; should I take that as "perfect, ship it?"

On 6/24/2020 10:44 AM, Brian Goetz wrote:
> There are a lot of directions we could take next for pattern 
> matching.  The one that builds most on what we've already done, and 
> offers significant incremental expressiveness, is extending the type 
> patterns we already have to a new context: switch.  (There is still 
> plenty of work to do on deconstruction patterns, pattern assignment, 
> etc, but these require more design work.)
> Here's an overview of where I think we are here.
> [JEP 305][jep305] introduced the first phase of [pattern 
> matching][patternmatch]
> into the Java language.  It was deliberately limited, focusing on only 
> one kind
> of pattern (type test patterns) and one linguistic context (`instanceof`).
> Having introduced the concept to Java developers, we can now extend 
> both the
> kinds of patterns and the linguistic context where patterns are used.
> ## Patterns in switch
> The obvious next context in which to introduce pattern matching is 
> `switch`;  a
> switch using patterns as `case` labels can replace `if .. else if` 
> chains with
> a more direct way of expressing a multi-way conditional.
> Unfortunately, `switch` is one of the most complex, irregular 
> constructs we have
> in Java, so we must teach it some new tricks while avoiding some 
> existing traps.
> Such tricks and traps may include:
> **Typing.**  Currently, the operand of a `switch` may only be one of the
> integral primitive types, the box type of an integral primitive, 
> `String`, or an
> `enum` type.  (Further, if the `switch` operand is an `enum` type, the 
> `case`
> labels must be _unqualified_ enum constant names.)  Clearly we can 
> relax this
> restriction to allow other types, and constrain the case labels to only be
> patterns that are applicable to that type, but it may leave a seam of 
> "legacy"
> vs "pattern" switch, especially if we do not adopt bare constant 
> literals as
> the denotation of constant patterns.  (We have confronted this issue 
> before with
> expression switch, and concluded that it was better to rehabilitate 
> the `switch`
> we have rather than create a new construct, and we will make the same 
> choice
> here, but the cost of this is often a visible seam.)
> **Parsing.**  The grammar currently specifies that the operand of a 
> `case` label
> is a `CaseConstant`, which casts a wide syntactic net, later narrowed with
> post-checks after attribution.  This means that, since parsing is done 
> before we
> know the type of the operand, we must be watchful for ambiguities between
> patterns and expressions (and possibly refine the production for 
> `case` labels.)
> **Nullity.**  The `switch` construct is currently hostile to `null`, 
> but some
> patterns do match `null`, and it may be desirable if nulls can be handled
> within a suitably crafted `switch`.
> **Exhaustiveness.**  For switches over the permitted subtypes of 
> sealed types,
> we will want to be able to do exhaustiveness analysis -- including for 
> nested
> patterns (i.e., if `Shape`  is `Circle` or `Rect`, then `Box(Circle 
> c)` and
> `Box(Rect r)` are exhaustive on `Box<Shape>`.)
> **Fallthrough.**  Fallthrough is everyone's least favorite feature of 
> `switch`,
> but it exists for a reason.  (The mistake was making fallthrough the 
> default
> behavior, but that ship has sailed.)  In the absence of an OR pattern
> combinator, one might find fallthrough in switch useful in conjunction 
> with
> patterns:
> ```
> case Box(int x):
> case Bag(int x):
>     // use x
> ```
> However, it is likely that we will, at least initially, disallow 
> falling out
> of, or into, a case label with binding variables.
> #### Translation
> Switches on primitives and their wrapper types are translated using the
> `tableswitch` or `lookupswitch` bytecodes; switches on strings and 
> enums are
> lowered in the compiler to switches involving hash codes (for strings) or
> ordinals (for enums.)
> For switches on patterns, we would need a new strategy, one likely 
> built on
> `invokedynamic`, where we lower the cases to a densely numbered `int` 
> switch,
> and then invoke a classifier function with the operand which tells us 
> the first
> case number it matches.  So a switch like:
> ```
> switch (o) {
>     case P: A
>     case Q: B
> }
> ```
> is lowered to:
> ```
> int target = indy[BSM=PatternSwitch, args=[P,Q]](o)
> switch (target) {
>     case 0: A
>     case 1: B
> }
> ```
> A symbolic description of the patterns is provided as the bootstrap 
> argument
> list, which builds a decision tree based on analysis of the patterns 
> and their
> target types.
> #### Guards
> No matter how rich our patterns are, it is often the case that we will 
> want
> to provide additional filtering on the results of a pattern:
> ```
> if (shape instanceof Cylinder c && c.color() == RED) { ... }
> ```
> Because we use `instanceof` as part of a boolean expression, it is easy to
> narrow the results by conjoining additional checks with `&&`.  But in 
> a `case`
> label, we do not necessarily have this opportunity.  Worse, the 
> semantics of
> `switch` mean that once a `case` label is selected, there is no way to say
> "oops, forget it, keep trying from the next label".
> It is common in languages with pattern matching to support some form 
> of "guard"
> expression, which is a boolean expression that conditions whether the case
> matches, such as:
> ```
> case Point(var x, var y)
>     __where x == y: ...
> ```
> Bindings from the pattern would have to be available in guard expressions.
> Syntactic options (and hazards) for guards abound; users would 
> probably find it
> natural to reuse `&&` to attach guards to patterns; C# has chosen 
> `when` for
> introducing guards; we could use `case P if (e)`, etc. Whatever we do 
> here,
> there is a readability risk,  as the more complex guards are, the 
> harder it is
> to tell where the case label ends and the "body" begins.  (And worse 
> if we allow
> switch expressions inside guards.)
> An alternate to guards is to allow an imperative `continue` statement in
> `switch`, which would mean "keep trying to match from the next 
> label."  Given
> the existing semantics of `continue`, this is a natural extension, but 
> since
> `continue` does not currently have meaning for switch, some work would 
> have to
> be done to disambiguate continue statements in switches enclosed in 
> loops.  The
> imperative version is strictly more expressive than most reasonable 
> forms of the
> declarative version, but users are likely to prefer the declarative 
> version.
> ## Nulls
> Almost no language design exercise is complete without some degree of 
> wrestling
> with `null`.  As we define more complex patterns than simple type 
> patterns, and
> extend constructs such as `switch` (which have existing opinions about 
> nullity)
> to support patterns, we need to have a clear understanding of which 
> patterns
> are nullable, and separate the nullity behaviors of patterns from the 
> nullity
> behaviors of those constructs which use patterns.
> ## Nullity and patterns
> This topic has a number of easily-tangled concerns:
>  - **Construct nullability.**  Constructs to which we want to add pattern
>    awareness (`instanceof`, `switch`) already have their own opinion about
>    nulls.  Currently, `instanceof` always says false when presented with a
>    `null`, and `switch` always NPEs.  We may, or may not, wish to 
> refine these
>    rules in some cases.
>  - **Pattern nullability.**  Some patterns clearly would never match 
> `null`
>    (such as deconstruction patterns), whereas others (an "any" 
> pattern, and
>    surely the `null` constant pattern) might make sense to match null.
>  - **Refactoring friendliness.**  There are a number of cases that we 
> would like
>    to freely refactor back and forth, such as certain chains of `if 
> ... else if`
>    with switches.
>  - **Nesting vs top-level.**  The "obvious" thing to do at the top 
> level of a
>    construct is not always the "obvious" thing to do in a nested 
> construct.
>  - **Totality vs partiality.**  When a pattern is partial on the 
> operand type
>    (e.g., `case String` when the operand of switch is `Object`), it is 
> almost
>    never the case we want to match null (except in the case of the `null`
>    constant pattern), whereas when a pattern is total on the operand 
> type (e.g.,
>    `case Object` in the same example), it is more justifiable to match 
> null.
>  - **Inference.**  It would be nice if a `var` pattern were simply 
> inference for
>    a type pattern, rather than some possibly-non-denotable union.
> As a starting example, consider:
> ```
> record Box(Object o) { }
> Box box = ...
> switch (box) {
>     case Box(Chocolate c):
>     case Box(Frog f):
>     case Box(var o):
> }
> ```
> It would be highly confusing and error-prone for either of the first two
> patterns to match `Box(null)` -- given that `Chocolate` and `Frog` 
> have no type
> relation, it should be perfectly safe to reorder the two. But, because 
> the last
> pattern seems so obviously total on boxes, it is quite likely that 
> what the
> author wants is to match all remaining boxes, including those that 
> contain null.
> (Further, it would be terrible if there were _no_ way to say "Match 
> any `Box`,
> even if it contains `null`.  (While one might initially think this 
> could be
> repaired with OR patterns, imagine that `Box` had _n_ components -- 
> we'd need to
> OR together _2^n_ patterns, with complex merging, to express all the 
> possible
> combinations of nullity.))
> Scala and C# took the approach of saying that "var" patterns are not 
> just type
> inference, they are "any" patterns -- so `Box(Object o)` matches boxes
> containing a non-null payload, where `Box(var o)` matches all boxes.  This
> means, unfortunately, that `var` is not mere type inference -- which 
> complicates
> the role of `var` in the language considerably.  Users should not have 
> to choose
> between the semantics they want and being explicit about types; these 
> should be
> orthogonal choices.  The above `switch` should be equivalent to:
> ```
> Box box = ...
> switch (box) {
>     case Box(Chocolate c):
>     case Box(Frog f):
>     case Box(Object o):
> }
> ```
> and the choice to use `Object` or `var` should be solely one of 
> whether the
> manifest types are deemed to improve or impair readability.
> #### Construct and pattern nullability
> Currently, `instanceof` always says `false` on `null`, and `switch` always
> throws on `null`.  Whatever null opinions a construct has, these are 
> applied
> before we even test any patterns.
> We can formalize the intuition outlined above as: type patterns that 
> are _total_
> on their target operand (`var x`, and `T t` on an operand of type `U`, 
> where `U
> <: T`) match null, and non-total type patterns do not. (Another way to say
> this is: a `var` pattern is the "any" pattern, and a type pattern that 
> is  total
> on its operand type is also an "any" pattern.)  Additionally, the `null`
> constant pattern matches null.  These are the _only_ nullable patterns.
> In our `Box` example, this means that the last case (whether written 
> as `Box(var
> o)` or `Box(Object o)`) matches all boxes, including those containing null
> (because the nested pattern is total on the nested operand), but the 
> first two
> cases do not.
> If we retain the current absolute hostility of `switch` to nulls, we can't
> trivially refactor from
> ```
> switch (o) {
>     case Box(Chocolate c):
>     case Box(Frog f):
>     case Box(var o):
> }
> ```
> to
> ```
> switch (o) {
>     case Box(var contents):
>         switch (contents) {
>             case Chocolate c:
>             case Frog f:
>             case Object o:
>         }
>     }
> }
> ```
> because the inner `switch(contents)` would NPE before we tried to 
> match any of
> the patterns it contains.  Instead, the user would explicitly have to 
> do an `if
> (contents == null)` test, and, if the intent was to handle `null` in 
> the same
> way as the `Object o` case, some duplication of code would be needed.  
> We can
> address this sharp corner by slightly relaxing the null-hostility of 
> `switch`,
> as described below.
> A similar sharp corner is the decomposition of a nested pattern `P(Q)` 
> into
> `P(alpha) & alpha instanceof Q`; while this is intended to be a 
> universally
> valid transformation, if P's 1st component might be null and Q is 
> total,  this
> transformation would not be valid because of the existing (mild) 
> null-hostility
> of `instanceof`.  Again, we may be able to address this by adjusting 
> the rules
> surrounding `instanceof` slightly.
> ## Generalizing switch
> The refactoring example above motivates why we might want to relax the
> null-handling behavior of `switch`.  On the other hand, the one thing the
> current behavior has going for it is that at least the current 
> behavior is easy
> to reason about; it always throws when confronted with a `null`.  Any 
> relaxed
> behavior would be more complex; some switches would still have to 
> throw (for
> compatibility with existing semantics), and some (which can't be expressed
> today) would accept nulls.  This is a tricky balance to achieve, but I 
> think we
> have a found a good one.
> A starting point is that we don't want to require readers to do an _O(n)_
> analysis of each of the `case` labels just to determine whether a 
> given switch
> accepts `null` or not; this should be an _O(1)_ analysis.  (We do not 
> want to
> introduce a new flavor of `switch`, such as `switch-nullable`; this 
> might seem
> to fix the proximate problem but would surely create others. As we've 
> done with
> expression switch and patterns, we'd rather rehabilitate `switch` than 
> create
> an almost-but-not-quite-the-same variant.)
> Let's start with the null pattern, which we'll spell for sake of 
> exposition
> `case null`.  What if you were allowed to say `case null` in a switch, 
> and the
> switch would do the obvious thing?
> ```
> switch (o) {
>     case null -> System.out.println("Ugh, null");
>     case String s -> System.out.println("Yay, non-null: " + s);
> }
> ```
> Given that the `case null` appears so close to the `switch`, it does 
> not seem
> confusing that this switch would match `null`; the existence of `case 
> null` at
> the top of the switch makes it pretty clear that this is intended 
> behavior.  (We
> could further restrict the null pattern to being the first pattern in 
> a switch,
> to make this clearer.)
> Now, let's look at the other end of the switch -- the last case.  What 
> if the
> last pattern is a total pattern?  (Note that if any `case` has a total 
> pattern,
> it _must_ be the last one, otherwise the cases after that would be 
> dead, which
> would be an error.)  Is it also reasonable for that to match null?  
> After all,
> we're saying "everything":
> ```
> switch (o) {
>     case String s: ...
>     case Object o: ...
> }
> ```
> Under this interpretation, the switch-refactoring anomaly above goes away.
> The direction we're going here is that if we can localize the 
> null-acceptance of
> switches in the first (is it `case null`?) and last (is it total?) 
> cases, then
> the incremental complexity of allowing _some_ switches to accept null 
> might be
> outweighed by the incremental benefit of treating `null` more 
> uniformly (and
> thus eliminating the refactoring anomalies.)  Note also that there is 
> no actual
> code compatibility issue; this is all mental-model compatibility.
> So far, we're suggesting:
>  - A switch with a constant `null` case  will accept nulls;
>  - If present, a constant `null` case must go first;
>  - A switch with a total (any) case matches also accepts nulls;
>  - If present, a total (any) case must go last.
> #### Relocating the problem
> It might be more helpful to view these changes as not changing the 
> behavior of
> `switch`, but of the `default` case of `switch`.  We can equally well 
> interpret
> the current behavior as:
>  - `switch` always accepts `null`, but matching the `default` case of 
> a `switch`
>    throws `NullPointerException`;
>  - any `switch` without a `default` case has an implicit do-nothing 
> `default`
>    case.
> If we adopt this change of perspective, then `default`, not `switch`, 
> is in
> control of the null rejection behavior -- and we can view these changes as
> adjusting the behavior of `default`.  So we can recast the proposed 
> changes as:
>   - Switches accept null;
>   - A constant `null` case will match nulls, and must go first;
>   - A total switch (a switch with a total `case`) cannot have a 
> `default` case;
>   - A non-total switch without a `default` case gets an implicit 
> do-nothing
>     `default` case;
>   - Matching the (implicit or explicit) default case with a `null` operand
>     always throws NPE.
> The main casualty here is that the `default` case does not mean the same
> thing as `case var x` or `case Object o`.  We can't deprecate 
> `default`, but
> for pattern switches, it becomes much less useful.
> #### What about method (declared) patterns?
> So far, we've declared all patterns, except the `null` constant 
> pattern and the
> total (any) pattern, to not match `null`.  What about patterns that are
> explicitly declared in code?  It turns out we can rule out these matching
> `null` fairly easily.
> We can divide declared patterns into three kinds: deconstruction 
> patterns (dual
> to constructors), static patterns (dual to static methods), and instance
> patterns (dual to instance methods.)  For both deconstruction and instance
> patterns, the match target becomes the receiver; method bodies are never
> expected to deal with the case where `this == null`.
> For static patterns, it is conceivable that they could match `null`, 
> but this
> would put a fairly serious burden on writers of static patterns to 
> check for
> `null` -- which they would invariably forget, and many more NPEs would 
> ensue.
> (Think about writing the pattern for `Optional.of(T t)` -- it would be
> overwhelmingly likely we'd forget to check the target for nullity.)  
> SO there
> is a strong argument to simply say "declared patterns never match 
> null", to
> not put writers of such patterns in this situation.
> So, only the top and bottom patterns in a switch could match null; if 
> the top
> pattern is not `case null`, and the bottom pattern is not total, then 
> the switch
> throws NPE on null, otherwise it accepts null.
> #### Adjusting instanceof
> The remaining anomaly we had was about unrolling nested patterns when 
> the inner
> pattern is total.  We can plug this by simply outlawing total patterns in
> `instanceof`.
> This may seem like a cheap trick, but it makes sense on its own.  If the
> following statement was allowed:
> ```
> if (e instanceof var x) { X }
> ```
> it would simply be confusing; on the one hand, it looks like it should 
> always
> match, but on the other, `instanceof` is historically null-hostile.  
> And, if the
> pattern always matches, then the `if` statement is silly; it should be 
> replaced
> with:
> ```
> var x = e;
> X
> ```
> since there's nothing conditional about it.  So by banning "any" 
> patterns on the
> RHS of `instanceof`, we both avoid a confusion about what is going to 
> happen,
> and we prevent the unrolling anomaly.
> For reasons of compatibility, we will have to continue to allow
> ```
> if (e instanceof Object) { ... }
> ```
> which succeeds on all non-null operands.
> We've been a little sloppy with the terminology of "any" vs "total"; 
> note that
> in
> ```
> Point p;
> if (p instanceof Point(var x, var y)) { }
> ```
> the pattern `Point(var x, var y)` is total on `Point`, but not an 
> "any" pattern
> -- it still doesn't match on p == null.
> On the theory that an "any" pattern in `instanceof` is silly, we may 
> also want
> to ban other "silly" patterns in `instanceof`, such as constant 
> patterns, since
> all of the following have simpler forms:
> ```
> if (x instanceof null) { ... }
> if (x instanceof "") { ... }
> if (i instanceof 3) { ... }
> ```
> In the first round (type patterns in `instanceof`), we mostly didn't 
> confront
> this issue, saying that `instanceof T t` matched in all the cases where
> `instanceof T` would match.  But given that the solution for `switch` 
> relies
> on "any" patterns matching null, we may wish to adjust the behavior of
> `instanceof` before it exits preview.
> [jep305]:
> [patternmatch]: pattern-match.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the amber-spec-experts mailing list