String literals: some principles
john.r.rose at oracle.com
Fri May 3 20:21:04 UTC 2019
On Apr 29, 2019, at 8:48 AM, Guy Steele <guy.steele at oracle.com> wrote:
>> On Apr 28, 2019, at 4:32 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>> . . .
>> Looking ahead to the next round, we can build on this. In the first round, we mistakenly thought that there was something that could reasonably be called a “raw” string, but this notion is a fantasy; no string literal is so raw that it can’t recognize its closing delimiter. So “rawness” is really only a matter of degree.
> This is _almost_ true. If a string is truly raw (that is, it can contain _anything_), then one absolutely cannot depend on recognizing the closing delimiter by examining what might be the raw content.
> Put another way: one cannot determine how long the raw content is by examining it. That’s a solid principle.
I'm going to be nit-picky here and refer to my earlier
mentions of the paradigm of strong quoting, which
at its heart simply means you have an infinite set of
delimiters to choose from, when wrapping a payload
into a literal syntax.
Adding a numeral to the open quote means that there
are now an unbounded set of open quotes, so it is an
instance of strong quoting. Another instance of strong
quoting adds nonces, and yet another just lengthens
the quote pattern until it doesn't occur (anywhere) in
the raw string payload.
The numeric prefix convention is different from other
kinds of strong quoting conventions, in that the end-quote
can be a substring of the payload. Actually, the end-quote
is most naturally the empty string, which is a substring
of every string.
The numeric prefix convention and other strong-quote
conventions all share a common property: The convention
as a whole is universal for arbitrary payloads, but for
any given payload there are quotes which work and others
that don't work. In the case of the numeric prefix
convention, once you choose an open-quote (with
numeral) you are limited to payloads of that length.
That's not quite a "raw string" any more, since it's
suitable only for a fixed-sized character field.
Likewise, once you choose a particular nonce-based
or patterned quote (e.g., seven double-quotes),
payloads containing the corresponding end-quote
as a substring are no longer suitable.
Once you pick a particular payload string, the next
question is whether you can embed that particular
string into your program without inserting escape
sequences. Only with a strong quote scheme of
some sort is this possible. But, with any of several
strong quote schemes, it is possible to dispense
with escapes for any given string; it is not a fantasy.
More information about the amber-spec-observers