Raw string literals and Unicode escapes
guy.steele at oracle.com
Tue Feb 27 21:12:14 UTC 2018
> On Feb 27, 2018, at 4:20 PM, John Rose <john.r.rose at oracle.com> wrote:
> On Feb 27, 2018, at 11:48 AM, Brian Goetz <brian.goetz at oracle.com <mailto:brian.goetz at oracle.com>> wrote:
>>> So after this length instead of having the probability to see a character to be virtually 1, you have the opposite effect, because programming languages (a human construct) are very regular in the set of chars they use. So you do not need to a repetition of a character to avoid a statistical effect that does not occur. Being able to choose the escape character, is enough.
>> The problem is not that it's enough, its that it is too much. Having nine ways to say the same thing is too many; having infinitely many (e.g., nonces) is worse. Having used the "pick your delimiter" approach taken by Perl, I find that you are *still* often bitten by the inability to find a good delimiter for embedding a snippet of a program written in a language similar to the outer language. And it surely makes code less readable, because many more things can be interpreted as quotes.
> My experience tracks with Brian's. That's why I think the random string
> model is more robust than some vague hope that languages won't overlap.
> Yes, random strings are an outlier, but less so that you'd think. A typical
> compression ratio for code is 5x, which means that if you replace "random
> string of length 10" with "random code snippet of length 50" you get the
> same analytic results. In order to exclude a close-quote, you need an
> additional constraint, which in practical terms results in folks having to
> grub around inside their raw strings looking for accidentall quotes.
Which leads us to the following theoretical result: the ```` mechanism does not require you to grub around in the interior of the string AT ALL if you don’t want to. All you need to know is the length. If the length of the raw string is n, and it does not begin or end with ` (a necessary check in any case), then using n-1 backquote characters before and after will always do the job.
In practice, many programmers (and programs) will be willing to do a quick search to see whether “```” or failing that “````” happens to be absent from the raw string. :-)
More information about the amber-spec-observers