Raw string literals and Unicode escapes
forax at univ-mlv.fr
Tue Feb 27 21:46:38 UTC 2018
> De: "Guy Steele" <guy.steele at oracle.com>
> À: "John Rose" <john.r.rose at oracle.com>
> Cc: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Envoyé: Mardi 27 Février 2018 22:12:14
> Objet: Re: Raw string literals and Unicode escapes
>> On Feb 27, 2018, at 4:20 PM, John Rose < [ mailto:john.r.rose at oracle.com |
>> john.r.rose at oracle.com ] > wrote:
>> On Feb 27, 2018, at 11:48 AM, Brian Goetz < [ mailto:brian.goetz at oracle.com |
>> brian.goetz at oracle.com ] > wrote:
>>>> So after this length instead of having the probability to see a character to be
>>>> virtually 1, you have the opposite effect, because programming languages (a
>>>> human construct) are very regular in the set of chars they use. So you do not
>>>> need to a repetition of a character to avoid a statistical effect that does not
>>>> occur. Being able to choose the escape character, is enough.
>>> The problem is not that it's enough, its that it is too much. Having nine ways
>>> to say the same thing is too many; having infinitely many (e.g., nonces) is
>>> worse. Having used the "pick your delimiter" approach taken by Perl, I find
>>> that you are *still* often bitten by the inability to find a good delimiter for
>>> embedding a snippet of a program written in a language similar to the outer
>>> language. And it surely makes code less readable, because many more things can
>>> be interpreted as quotes.
>> My experience tracks with Brian's. That's why I think the random string
>> model is more robust than some vague hope that languages won't overlap.
>> Yes, random strings are an outlier, but less so that you'd think. A typical
>> compression ratio for code is 5x, which means that if you replace "random
>> string of length 10" with "random code snippet of length 50" you get the
>> same analytic results. In order to exclude a close-quote, you need an
>> additional constraint, which in practical terms results in folks having to
>> grub around inside their raw strings looking for accidentall quotes.
> Which leads us to the following theoretical result: the ```` mechanism does not
> require you to grub around in the interior of the string AT ALL if you don’t
> want to. All you need to know is the length. If the length of the raw string is
> n, and it does not begin or end with ` (a necessary check in any case), then
> using n-1 backquote characters before and after will always do the job.
> In practice, many programmers (and programs) will be willing to do a quick
> search to see whether “```” or failing that “````” happens to be absent from
> the raw string. :-)
Ok, i'm clearly in minority here, the repetition pattern wins.
More information about the amber-spec-observers