> Second, this proposal should explain why it's reinventing the wheel
> instead of standardizing existing, very successful, prior art.  Answer
> the question: “what compelling advantages does this syntax have over
> Python's?”

Sure.

First of all, I will admit up front that I have not written much Python (a 
couple weeks ago, "much" would have been "any") and I may not fully understand 
their string literals. So I'll start by describing my understanding of the 
design in question; then I'll critique the design as I understand it. So if 
something in this section is wrong, please forgive any related mistakes in the 
critique.

Python offers a `"""` string which is almost the same as the `"` string:

        * Every character between the first `"""` and the second `"""` is part 
of its contents.
        * Escapes are processed normally.
        * There is no special behavior with regards to whitespace.

The only difference is that a `"""` string allows real, unescaped newlines in 
it, while a `"` string forbids them. (And, of course, since the delimiter is 
`"""`, the strings `"` and `""` are interpreted literally.)

This approach is really simple, which is a plus, but it has a number of issues.



CONTENT FORMATTING

A number of aspects of the design combine to make `"""` strings harder to read 
than they should be:

        * You can't indent the contents of a `"""` string to match the code 
it's in. This is actually pretty shocking considering how sensitive Python is 
to indentation, and it necessitates a number of strange hacks (for instance, 
Python's `help()` function unindents all but the first line of doc strings).
        * You can't put all of the contents against the left margin, either, 
because a newline right after the `"""` is counted as part of the string's 
contents. (You can use a backslash to work around this.)
        * The last line of the string also has to have the delimiter in it, 
because again, a newline right before the `"""`is counted as part of the 
string's contents. (You can use a backslash to work around this, but the 
backslash is *not* in the mirror position of the start of the string, so good 
luck remembering it.)

In other words, the first and last lines have to be adulterated by adding a 
`"""`, and the middle lines can't be indented to line up with either the 
surrounding code or the beginning of the first line. If one of the selling 
points of this feature is that you just stick your contents in verbatim without 
alteration, that isn't great.

This is such a problem that, in researching `"""` to be sure I understood how 
it works, I came across a Stack Overflow question whose answers are full of 
people recommending a different, more highly punctuated, feature instead: 
<http://stackoverflow.com/questions/1520548/how-does-pythons-triple-quote-string-work>

(There is an alternate design which would fix the beginning and end problems: 
make a newline after the opening delimiter and before the closing delimiter 
mandatory and part of the delimiter. You might then choose to fix the 
indentation problem by taking the whitespace between the closing delimiter and 
the newline before it as the amount of indentation for the entire string, and 
removing that much indentation from each line. But that's not what Python does, 
and it's not what you seem to be proposing.)



BREAKING UP EXPRESSIONS

String literals are expressions, and in fact, they are expressions with no side 
effects. To do anything useful, they *must* be put into a larger expression. 
Often this expression is an assignment, but it could be anything—concatenation, 
method call, function parameter, you name it.

This creates a challenge for multiline strings, because they can become very 
large and effectively break up the expression they're in. The 
continuation-quote-based multiline strings I'm proposing are aimed primarily at 
relatively short strings*, where this is less of a concern. But `"""` aims to 
be used not only for short strings, but for ones which may be many dozens or 
even hundreds of lines long. You're going to end up with code like:

        print("""<?xml version="1.0"?>
        <catalog>
                <book id="bk101" empty="">
                        ...
                        ...
                        ...a hundred more lines of XML with interpolations in 
it...
                        ...
                        ...
                </book>
        </catalog>""")

What does that `)` mean? Who knows? We saw the beginning of the expression an 
hour and a half ago. (It's common to avoid this issue by assigning the string 
to a constant even if it's only going to be used once, but that just changes 
the problem a little—now you're trying to remember the name of a local variable 
declared a hundred lines ago.)

Heredocs cleverly avoid this issue by not trying to put the literal's contents 
in the middle of the expression. Instead, they put a short placeholder in the 
expression, then start the contents on the next line. The expression is 
readable as an expression, while the contents of the literal are adjacent but 
separate. That's why I think they're a better solution than `"""` for truly 
massive string literals.

* This is something I am not saying in the proposal, but I really should.



NESTING

Another problem is that you don't get another choice besides `"""`. That's not 
so bad, though, right? It's such an uncommon sequence of characters, surely 
you'll never encounter it?

Well, sure...until you try to generate code.

For instance, suppose you're writing a web app using a barebones Swift 
framework and you have a lot of code like this:

        response.send("""<tr>
                <td>\(name)</td>
                <td>\(value)</td>
        </tr>
        """)

Every 90s Perl hacker knows what a pain this is, and every 90s Perl hacker 
knows the solution: a template language. Hack together some kind of simple 
syntax for embedding commands in a file of content, and then convert it into 
runnable code with a tool that does things like:

        print("""
        response.send("""\(escapedContent)""")
        """)

...oh. Wait a minute there.

To get around this, you really need to support, not two delimiters, but *n* 
delimiters. Heredocs let you choose an arbitrary delimiter. C++ lets you 
augment the delimiter with arbitrary characters. Perl's `qq` construct lets you 
choose a single character, but it can be almost anything you want (and some of 
them nest). I'm thinking about letting you extend the delimiter with an 
arbitrary number of underscores. All of these solutions have in common that 
they don't just have "primary" and "alternate" delimiters, but an effectively 
endless number of them.

`"""` does not have this feature—you just have the primary delimiter and the 
alternate delimiter, and if neither of them works for you, you have to escape. 
That isn't ideal.



RUNAWAY LITERALS

`"""` does not offer much help with preventing or diagnosing runaway literals 
or highlighting code with half-written literals. Heredocs don't either, but I 
envision heredocs being used less often than `"""` strings would be, since 
continuation quotes would handle shorter strings.



SYNTAX HIGHLIGHTING

So, let's talk about this:

>>  (like Python's """ strings) which trick some syntax
>>  highlighters into working some of the time with some contents, we don't 
>> think
>>  this occasional, accidental compatibility is a big enough gain to justify
>>  changing the design.
> 
> I've never seen a syntax highlighter have problems with it, I don't see
> how it *could* ever cause a problem, and lastly I think it's both naïve
> and presumptuous to call these effects accidental.

I call these effects "accidental" because the syntax highlighter was not 
designed to handle the `"""`; it just happens to handle it correctly because it 
misinterprets a `"""` string as an empty `"` string, followed by a non-empty 
`"` string, followed by another empty `"` string. It's "accidental" from the 
perspective of the syntax highlighter designer, not the language designer, who 
probably intended that to happen.

And it only works in a specific subset of cases. It breaks if:

* The syntax highlighter tries to apply smarter per-language rules.
* The syntax highlighter assumes that strings are not allowed to be multi-line. 
(This is true of many languages, including C derivatives and Swift 2.)
* The string literal contains any `"` characters, which `"""` is often used in 
order to permit.
* The string literal contains any escapes or special features that the syntax 
highlighter misinterprets, like an interpolation which itself contains a string 
literal.

Yes, it will often work, or at least sort-of work. But I just don't see that as 
very valuable.



WHAT'S GOOD ABOUT `"""`?

In my opinion, the best thing about `"""` (the language feature) is `"""` (the 
token).

A sequence of three quote marks is a fantastic token for a feature meant to 
create long string literals. It clearly has something to do with string 
literals, but it cannot be an empty string, because there are too many quote 
marks—that is, it's too long. It's a really clever mnemonic which also parses 
unambiguously.

I've spoken before in this thread and others about potentially using `"""` as 
an alternate delimiter (which could be extended to `"""""` and beyond). I'm 
also considering the idea that it might be a good token for a Perl-style 
heredoc syntax:

        print(""" + e""")
        It was a dark and stormy \(timeOfDay) when 
        """
        the Swift core team invented the \(interpolation) syntax.
        """

Nesting could be achieved with a version of whatever alternate delimiter syntax 
we use for `"` strings. For instance, if we adopted the `_"foo"_` syntax I 
sketched:

        print(_"""_)
        response.send(""")
        \(escapedContent)
        """
        _"""_



(P.S. If this post seems way too long to have been written in a couple hours, 
that's because I've been drafting a version of it on and off for a day or two; 
it just so happened that Dave directly asked me to confront `"""` today.)

-- 
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to