String literals: some principles
brian.goetz at oracle.com
Sun Apr 28 20:32:05 UTC 2019
I would like to point out a key principle that has guided this second round of exploration on string literals, and mention how it might guide the next round (without actually diving into that round.).
Classic string literals, and the new “fat” string literals — are now recognizable as variations on the same feature, each adapted to their niche (single vs multi-line.). The “escape language” supported by both is identical — and should stay that way — the only difference is the delimiter, and the handling of artifacts of embedding a snippet of foreign text in a traditionally-indented Java program. (Even their delimiters are similar.). This is a good thing.
Looking ahead to the next round, we can build on this. In the first round, we mistakenly thought that there was something that could reasonably be called a “raw” string, but this notion is a fantasy; no string literal is so raw that it can’t recognize its closing delimiter. So “rawness” is really only a matter of degree.
We can characterize a string literal language as:
- Opening delimiter
- Closing delimiter
- Escape characters, if any
- Escape sublanguages, if any
That is, we process ordinary characters until we encounter either the closing delimiter, or one of the escape characters. When we encounter an escape character, we process a “program” from the escape language, and then go back to processing ordinary characters.
Classic string literals have opening and closing delimiters of “, an escape character of \, and an escape language that includes “programs” like:
n — newline
t — tab
0nnn — octal literal
“ — quote character
Fat string literals are the same, except that the opening and closing delimiter are “””. But we keep the same escape language. This is valuable.
It is worth asking explicitly: do we want to keep the same escape character too? Guy has suggested offline that we might consider \\\ as the escape character for fat strings.
Looking ahead (but please, let’s not open this discussion now), one of the tools we have at hand for representing degrees of “raw-ness” is, as we “strengthen" the delimiter, we also strenghten the escape character at the same rate — but keep the escape language intact. This would allow raw strings to be yet another projection of the same basic string literal feature, while requiring increasingly explicit action on the part of the user to access the escape language.
I bring this up not because I want to talk about raw-ness now (getting the hint?), but because I want to keep all the variations of string literals as lightly-varying projections of the same basic feature. It has come up, for example, that we might treat \<newline> differently in ML strings as in classic strings, but I would prefer it we could not tinker with the escape language in nonuniform ways — as this minimizes the variations between the various sub-features. So I offer this peek down the road as a means of
Soliciting discussion on the pros and cons of keeping \ as our escape character.
More information about the amber-spec-experts