RFR: Multi-line String Literal (Preview) JEP [EG Draft]
daniel.smith at oracle.com
Tue May 14 23:15:55 UTC 2019
> On May 13, 2019, at 8:05 AM, Jim Laskey <james.laskey at oracle.com> wrote:
> After some significant tweaks, reopening the JEP for review. https://bugs.openjdk.java.net/browse/JDK-8222530 <https://bugs.openjdk.java.net/browse/JDK-8222530>
Something really clicks for me in calling these "text blocks". The delimiter syntax and conventions for line breaks/whitespace, which seemed somewhat arbitrary before, feel right. Nice psychological trick.
Let me weigh in with some design feedback, in a refined form of some comments I made in a previous thread:
Finding the right indentation trimming algorithm has been a struggle. We've come up with something, but it sure seems complex, and I'll bet most programmers will never fully internalize it. The struggle arises primarily because the feature has an ambitious goal of getting it "right" for a wide variety of indentation conventions, and also because the feature is constrained to be a post-processing step, independent of program context. I suggest rethinking both of those requirements.
Instead, the language should be strongly opinionated about how text blocks should be indented, and should take the enclosing context into account. Specifically, the opening """ delimiter should mark the left margin of the text block, and it should be a compiler error to put content to the left of that margin. This results in a really simple, readable approach to indenting: the delimiter marks the rectangle.
- The *prefix* of a text block is the program text after the immediately preceding \n or \r, up to the opening """, with every non-whitespace character replaced with a space (\u0020).
- The form of a text block is """ <whitespace>* ( <newline> <prefix> <content>* )+ """ (that is, opening delimiter, ignored whitespace, then one or more lines of content, each prefixed by a newline and the *prefix*; all prefixes must be identical).
- The string denoted by a text block is its <content>* strings after escape processing, concatenated together with '\n'.
Most of the examples in the JEP follow these rules as a convention already. The concatenation examples would benefit from following it.
What if I want to shift my content left? Just put a line break before the opening delimiter, and align it wherever you want to set your left margin. (If you don't want to strip anything, put the opening delimiter in column 0.) You're n-line text block now takes n+1 lines—nbd.
What if I want to shift my content right, beyond the delimiters? Don't do that. That's not how text blocks work. (I mean, you can do it, but your extra whitespace will be included in the denoted string.)
What about tabs? Tabs that come before the opening delimiter are recognized, and all prefixes must use the same pattern of tabs/spaces/[other exotic whitespace]. What if you want to have program text on the same line as the opening delimiter, but then want to use tabs underneath?:
\t \t System.out.println("""
\t \t \t \t \t \t \t Hello world!
\t \t \t \t \t \t \t """);
Well, then you're doing tabs wrong—different tab widths will make "Hello world!" appear to the left or right of the delimiters. So this is an error. Either use spaces after the first two tabs, or put the opening delimiter on a new line.
What about variable-width fonts? If you expect your code to be read in a variable-width font, by convention you should start all text blocks on a (possibly-indented) blank line.
What about Unicode escapes? It's an orthogonal question, but I think it's fine to continue pre-processing all Unicode escapes. If you want obfuscate prefixes and line breaks using \u0020 and \u000a, go for it.
More information about the amber-spec-observers