PROPOSAL: Multiline strings

Reinier Zwitserloot reinier at
Sun Mar 1 10:41:54 PST 2009

Embedding regexps has two significant advantages:

  1. compile-time checking of your regexps. Sure, most random  
gibberish just so happens to be a valid regexp, but there are rules -  
you can have mismatched parentheses, for example.

  2. compile-time compilation of regexps. If compiling the regexp is  
allowed to take rather long, then you can effectively create O(n)  
matching algorithms, where n is the size of the input string. What  
better time is there to do the compilation of the regexp than when  
you're compiling the code? Javac would essentially include the  
serialized form of a compiled regexp into the class file, instead of  
the string.

As far as the multi-line string proposal: It is very incomplete. I  
suggest resubmitting it with documentation on handling raw strings  
(let's leave regexp literals for another proposal; as has been said,  
even if the language has regexp literals, raw strings are still a  
useful constrict), and on handling white space. It should also cover  
handling of newlines (if the file contains \r\n because it was written  
on windows, should those be kept as is or should they be replaced with  
\n line-endings, which seems like the right answer to me).

My personal favourite way to do whitespace:

After the first newline, eliminate all leading whitespace. Then  
consider that amount of whitespace (no translating of tabs to spaces)  
to be the indent. Thus, the following:

String foo = """

is equal to: String foo = "bar\n            baz\n  bla\nqux";

and the following:

String foo = """

is a compile-time error.

If you need leading whitespace, you'll need to prefix this in a  
separate string and concatenate them, or add them on the same line.  
So, if you need "\t\nfoo\n", and you don't want to use \t, you could  
write it as:

String foo = ""        < - You don't see it, but there's a tab here.

or as:

String foo = "\t" + """

  --Reinier Zwitserloot
Like it? Tip it!

On Mar 1, 2009, at 18:53, Jeremy Manson wrote:

> The plus side of the escaped String approach is that you can then use
> any language, not just regexps.  Also, escaped Strings might be a plus
> for security purposes.
> Also, I'm not a big fan of the idea of embedding the
> domain-specific-language-du-jour into my programming language.  I see
> it as a slippery slope.  It's regexps today, but it's XML tomorrow
> (I'm looking at you, Scala).
> Jeremy
> On Sun, Mar 1, 2009 at 1:16 AM, Adrian Kuhn <akuhn at> wrote:
>> On 01.03.2009, at 09:58, Jeremy Manson wrote:
>>> Frankly, to me, the big win would actually not be multiline  
>>> literals,
>>> but would be escaped String literals.  I'm sick of writing all of my
>>> regexps with twice as many \ characters as they need.
>> In this case, why not allow regexps to be written literally in source
>> code? As is done in many other languages. Although, this change would
>> couple the regexp API with the language. But maybe here, the benefit
>> might be worth the costs.
>> --AA

More information about the coin-dev mailing list