[ string literals ] Extending the escape language (was: String literals: some principles)
brian.goetz at oracle.com
Tue May 7 22:14:59 UTC 2019
> TL;DR: Good framework; must also account for the
> rectangle extraction rule (RER). A unified escape
> sublanguage (ESL) is highly desirable, and I propose
> adding <\ > and <\ LT WS*> as escapes for space
> and for null string. The existing \ char is OK, and
> should be "fattened" as a separate feature. I note
> some issues with <\ u X X X X>.
Agree in general with the desire to extend ESL with some whitespace sequences, though I take some issues with the syntax on \<nl> and \<space>. Some alternate ideas regarding \uxxxx.
First, unicode escapes. Alex pointed out offline that we had worked our way into a linear thinking trap (again). In the first round, because we were focused on raw strings, we turned off \uxxxx processing in the body of a raw string, which raised the question of “how do we turn it back on.” And also that, while we use the same escape character for both, they occupy very different places in the language; the ESL is purely about string literals, whereas \uxxxx is purely a lexing concern.
His recommendation, which (now that its been explained to me) I strongly agree with, is: let’s not have this feature touch unicode processing at all. Let’s just leave unicode processing as is, using \uxxxx, whether in code, SLSLs, MLSLs, and any future “raw” SLs. The similarly between \n and \uxxxx is purely coincidental. And if we really want the characters "\u0000” in a string literal, well, we know how to escape the \.
Which brings us to \<eol> and \<space>. My main complaint here is that I am really uncomfortable using \<space> for “literal space”, because at the end of the line, one cannot differentiate between \<eol> and \<space> when reading the code. Alternatives include \_, or \s, or \., or … many others.
More information about the amber-spec-observers