String reboot (plain text)
kevinb at google.com
Wed Mar 13 18:56:32 UTC 2019
On Wed, Mar 13, 2019 at 10:52 AM Brian Goetz <brian.goetz at oracle.com> wrote:
Lots of good discussion so far. Let me gather the threads.
> - The primary use case is embedding multi-line chunks of foreign code or
> data in Java, with minimal need to cruft it up with escaping. This says to
> me that _multi-line strings_ are actually the high-order bit here, and raw
> strings are the next bit. Let’s address these in order.
We have found this to be true; we have also found that the next few use
cases are actually not far behind: console output, long "expected" strings
in tests, and long exception messages. (The last case is interesting
because you don't really want to keep \n's in the string at runtime, yet
it's still *way* nicer to write as a multiline literal that can be easily
reflowed without dealing with `" + "`. )
Happily, these cases also all support the claim that multi-line-ness is
more often desired than raw-ness.
- Multi-line-ness and raw-ness are orthogonal concepts.
Is that true, as stated? I would have said that any support for rawness
automatically gives you support for multi-line-ness by nature, because a
newline character becomes literal. That doesn't seem like orthogonality.
I say this because it's the reason I was always completely fine with the
fact we were talking only about a "raw" feature and not two independent
features. The proposal as it was published months ago would have done
somewhere close to 100% of what our codebase needs... if only we could have
settled how to get indentation stripping. We had options for how that could
be done in a reasonably learnable way, and of course with the strict
requirement that the "I only care about rawness" users are unaffected. I
know this opinion is not shared, but it seemed to me that it was only our
discomfort with writing the stripping behavior into the language spec, and
nothing else, that stopped us from having a great solution.
Some languages merge them, and we might consider doing that too, but we
> shouldn’t start there.
> - For multi-line strings, a stronger delimiter (e.g., """) seems to be
> preferred on readability grounds, because people don't want to have to
> squint to see where the embedded code ends and the Java code resumes.
Valid point. Today, every line or group of lines in a .java source
file *is* Java
code, but now there will be sections where that's not at all clearly the
case. Making the boundaries clear between the two types of code seems like
a good practice. The old proposal *allowed* a single backtick to offset
these sections in 99% of cases, but it occurred to me that developers would
often be better off using more of them just to delineate better...
To which I'll add the following observations:
> - Most multi-line string candidates (JSON, XML, SQL, etc) do not require
> characters that have to be escaped, as long as we don't have conflicts with
> the quote character.
(We did find this to be true. Quotes, of course, are quite common.)
> For our first course, we could have multi-line strings, delimited by the
> fixed delimiter """. These would be escaped strings, just like existing
> string literals, but because the single-quote is no longer the delimiter,
> the most common source of escaping (embedded quotes) is removed. Most
> multi-line strings will require no escaping at all.
> Note that if we stopped here _and never ordered anything else_, we would
> still be in a much better place than we are now (most snippets could just
> be cut and pasted without mangling), and what we've introduced is
> dead-simple! So the cost-benefit ratio here is high; it’s a simple
> addition that addresses a significant fraction of the pain points. I think
> we should at least order this.
This is true. (Call this State A for now.)
> Now, maybe we're still a little hungry, and the above doesn't help with
> those strings that are most polluted by escapes, such as regular
> expressions. So, we might additionally order the ability to layer a way to
> say "no escape mangling" atop both our " strings and our """
> strings. Jim proposes we use a delimiter of \".."\ for such strings (\"""
> ... """\ for the multi-line version). This has a nice connotation; it is
> as if the backslash is “distributed over” the whole string.
This is the part that concerns me a lot. I think that adding *two* new
string-literal features that can be used separately or together is putting
the language in a *much* more complex state. If we reached State A (above)
I would feel much better about stopping there than coming here.
And it would be a bummer about regular expressions. I believe State A is
inferior to the proposal we looked at several months ago, which did a
pretty good job of also handling things like that.
1a. Do 1, but automatically reflow multi-line strings using the equivalent
> of String::align. There have been reasonable proposals on how to do this;
> where they fell apart is the interaction with raw-ness,
Did they? I didn't think they did. The problem that a raw string might get
unintentionally stripped seemed to me like one we had easy ways to deal
As Brian knows well, it really surprises me that we went back to the
drawing board with this feature, because the reasons we cited for doing so
seemed so very minor.
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
More information about the amber-spec-observers