Extended string literals feedback
scolebourne at joda.org
Thu Jan 3 01:14:51 UTC 2019
Thanks again for reopening the debate. I agree with the vast majority
of the comments made by Brian and Jim. Running through the decision
trees and other points:
Multi-line vs Raw?
I think there is a need for both, and they seem pretty orthogonal.
Regex are a pain, and multi-line has always been just missing. I fully
agree that proper consideration of multi-line strings as its own
feature is worth doing.
JSON, HTML, XML, SQL drivers?
Java can't be ignored either, but that doesn't mean perfect support is
needed. I think the broader thrust is that the goal is not any "old
random set of characters" or "perfect wrapping around a complex
language". I'm 100% sold on getting the 98% use case correct - there
is of course debate about how bad the last 2% is.
It depends on how often it comes up as a problem. As a basic
principle, escaping in multi line strings seems fine, but it also
seems reasonable to not rule out raw multi-line strings.
Few will object to making more use of double quotes. I think there is
also a case for more use of single quotes. See also below.
I would specialize such that triple quotes are needed for multi-line.
My viewpoint is that when you have a multi-line section of embedded
code, it is reasonable to require a more weighty delimiter for basic
readability reasons. It is also the case that IDEs currently have
behaviour for when the return key is pressed within a string literal,
and unlearning that behaviour would be painful (not a strong argument,
but worth mentioning for consideration). I'd also note that lots of
code is read outside an IDE (eg. on GitHub PRs), so any justification
relying on IDE syntax highlighting should be struck off.
Cooked and newline-normalized sound good.
Additional possible requirements?
I'd like to reserve a syntax for value type literals, such as for
LocalDate where the parse occurs at compile time. I'd also like to
reserve a syntax for parsed multi-line literals, such as XML that is
parsed at compile time. These may or may not be the same feature.
Raw string requirements?
Regex is much more significant than Windows paths IMO, simply because
Windows generally also understands Unix-style paths. What regex
primarily needs is no backslash escapes, but the ability to contain a
double quote is also important (In my experience regex contains double
quote more often than it contains single quote, but I don't have any
actual data for that).
Is rawness a property or a state?
I see why state might be appealing, but to be honest I don't think its
a good fit in Java (modes of operation switched in and out are
generally a feature of complex APIs). The r prefix seems a lot
clearer. (A single letter prefix allows for combinations of letters in
the future if that ever makes sense, eg rs""" might be the prefix for
a raw string stripped of leading spaces).
What direction would I take?
For the single line case I would allow single double quotes or triple
double quotes. For the multi-line case, I would only allow triple
double quotes. Raw would be completely orthogonal via a prefix:
This is pretty simple to understand and explain, and covers most use
cases. Allowing triple double quotes on a single line deals with most
of the use cases to escape double quotes within raw strings.
The last problem worth mitigating is cases like CSV that yields triple
quotes relatively easily because they "double up" to escape. See
https://stackoverflow.com/a/769820/38896 for example (ie. I think
because of "double up" use cases, this is part of the 98%, not part of
the 2%). To mitigate this problem, I think there is a simple approach
- make triple single quote have the same meaning as triple double
In JLS terms, this would be adding a new "extended string literal"
with a delimiter of 3 quotes (either 3 single quotes or 3 double
quotes). Plus both basic and extended string literals would be
convertible to raw via a prefix.
More information about the amber-dev