Raw string literals and Unicode escapes
john.r.rose at oracle.com
Tue Feb 27 21:20:56 UTC 2018
On Feb 27, 2018, at 11:48 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
>> So after this length instead of having the probability to see a character to be virtually 1, you have the opposite effect, because programming languages (a human construct) are very regular in the set of chars they use. So you do not need to a repetition of a character to avoid a statistical effect that does not occur. Being able to choose the escape character, is enough.
> The problem is not that it's enough, its that it is too much. Having nine ways to say the same thing is too many; having infinitely many (e.g., nonces) is worse. Having used the "pick your delimiter" approach taken by Perl, I find that you are *still* often bitten by the inability to find a good delimiter for embedding a snippet of a program written in a language similar to the outer language. And it surely makes code less readable, because many more things can be interpreted as quotes.
My experience tracks with Brian's. That's why I think the random string
model is more robust than some vague hope that languages won't overlap.
Yes, random strings are an outlier, but less so that you'd think. A typical
compression ratio for code is 5x, which means that if you replace "random
string of length 10" with "random code snippet of length 50" you get the
same analytic results. In order to exclude a close-quote, you need an
additional constraint, which in practical terms results in folks having to
grub around inside their raw strings looking for accidentall quotes.
More information about the amber-spec-observers