> Awesome.  Some specific suggestions below, but feel free to iterate in a pull 
> request if you prefer that.

I've adopted these suggestions in some form, though I also ended up rewriting 
the explanation of why the feature was designed as it is and fusing it with 
material from "Alternatives considered".

(Still not sure who I should list as a co-author. I'm currently thinking John, 
Tyler, and maybe Chris? Who's supposed to go there?)

Multiline string literals

Proposal: SE-NNNN 
<https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md>
Author(s): Brent Royal-Gordon <https://github.com/brentdax>
Status: Second Draft
Review manager: TBD
 
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#introduction>Introduction

In Swift 2.2, the only means to insert a newline into a string literal is the 
\n escape. String literals specified in this way are generally ugly and 
unreadable. We propose a multiline string feature inspired by English 
punctuation which is a straightforward extension of our existing string 
literals.

This proposal is one step in a larger plan to improve how string literals 
address various challenging use cases. It is not meant to solve all problems 
with escaping, nor to serve all use cases involving very long string literals. 
See the "Future directions for string literals in general" section for a sketch 
of the problems we ultimately want to address and some ideas of how we might do 
so.

Swift-evolution threads: multi-line string literals. (April) 
<https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160418/015500.html>,
 multi-line string literals (December) 
<https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/002349.html>
 
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#draft-notes>Draft
 Notes

Removes the comment feature, which was felt to be an unnecessary complication. 
This and the backslash feature have been listed as future directions. 

Loosens the specification of diagnostics, suggesting instead of requiring 
fix-its.

Splits a "Rationale" section out of the "Proposed solution" section.

Adds extensive discussion of other features which wold combine with this one.

I've listed only myself as an author because I don't want to put anyone else's 
name to a document they haven't seen, but there are others who deserve to be 
listed (John Holdsworth at least). Let me know if you think you should be 
included.

 
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#motivation>Motivation

As Swift begins to move into roles beyond app development, code which needs to 
generate text becomes a more important use case. Consider, for instance, 
generating even a small XML string:

let xml = "<?xml version=\"1.0\"?>\n<catalog>\n\t<book id=\"bk101\" 
empty=\"\">\n\t\t<author>\(author)</author>\n\t</book>\n</catalog>"
The string is practically unreadable, its structure drowned in escapes and 
run-together lines; it looks like little more than line noise. We can improve 
its readability somewhat by concatenating separate strings for each line and 
using real tabs instead of \t escapes:

let xml = "<?xml version=\"1.0\"?>\n" + 
          "<catalog>\n" + 
          " <book id=\"bk101\" empty=\"\">\n" + 
          "     <author>\(author)</author>\n" + 
          " </book>\n" + 
          "</catalog>"
However, this creates a more complex expression for the type checker, and 
there's still far more punctuation than ought to be necessary. If the most 
important goal of Swift is making code readable, this kind of code falls far 
short of that goal.

 
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#proposed-solution>Proposed
 solution

We propose that, when Swift is parsing a string literal, if it reaches the end 
of the line without encountering an end quote, it should look at the next line. 
If it sees a quote at the beginning (a "continuation quote"), the string 
literal contains a newline and then continues on that line. Otherwise, the 
string literal is unterminated and syntactically invalid.

Our sample above could thus be written as:

let xml = "<?xml version=\"1.0\"?>
          "<catalog>
          " <book id=\"bk101\" empty=\"\">
          "     <author>\(author)</author>
          " </book>
          "</catalog>"
If the second or subsequent lines had not begun with a quotation mark, or the 
trailing quotation mark after the </catalog>tag had not been included, Swift 
would have emitted an error.

 
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#rationale>Rationale

This design is rather unusual, and it's worth pausing a moment to explain why 
it has been chosen.

The traditional design for this feature, seen in languages like Perl and 
Python, simply places one delimiter at the beginning of the literal and another 
at the end. Individual lines in the literal are not marked in any way. 

We think continuation quotes offer several important advantages over the 
traditional design:

They help the compiler pinpoint errors in string literal delimiting. 
Traditional multiline strings have a serious weakness: if you forget the 
closing quote, the compiler has no idea where you wanted the literal to end. It 
simply continues on until the compiler encounters another quote (or the end of 
the file). If you're lucky, the text after that quote is not valid code, and 
the resulting error will at least point you to the next string literal in the 
file. If you're unlucky, you'll get a seemingly unrelated error several 
literals later, an unbalanced brace error at the end of the file, or perhaps 
even code that compiles but does something totally wrong.

(This is not a minor concern. Many popular languages, including C and Swift 2, 
specifically reject newlines in string literals to prevent this from happening.)

Continuation quotes provide the compiler with redundant information about your 
intent. If you forget a closing quote, the continuation quotes give the 
compiler a very good idea of where you meant to put it. The compiler can point 
you to (or at least very near) the end of the literal, where you want to insert 
the quote, rather than showing you the beginning of the literal or even some 
unrelated error later in the file that was caused by the missing quote.

Temporarily unclosed literals don't make editors go haywire. The syntax 
highlighter has the same trouble parsing half-written, unclosed traditional 
quotes that the compiler does: It can't tell where the literal is supposed to 
end and the code should begin. It must either apply heuristics to try to guess 
where the literal ends, or incorrectly color everything between the opening 
quote and the next closing quote as a string literal. This can cause the file's 
coloring to alternate distractingly between "string literal" and "running code".

Continuation quotes give the syntax highlighter enough context to guess at the 
correct coloration, even when the string isn't complete yet. Lines with a 
continuation quote are literals; lines without are code. At worst, the syntax 
highlighter might incorrectly color a few characters at the end of a line, 
rather than the remainder of the file.

They separate indentation from the string's contents. Traditional multiline 
strings usually include all of the content between the start and end 
delimiters, including leading whitespace. This means that it's usually 
impossible to indent a multiline string, so including one breaks up the flow of 
the surrounding code, making it less readable. Some languages apply heuristics 
or mode switches to try to remove indentation, but like all heuristics, these 
are mistake-prone and murky.

Continuation quotes neatly avoid this problem. Whitespace before the 
continuation quote is indentation used to format the source code; whitespace 
after the continuation quote is part of the string literal. The interpretation 
of the code is perfectly clear to both compiler and programmer.

They improve the ability to quickly recognize the literal. Traditional 
multiline strings don't provide much visual help. To find the end, you must 
visually scan until you find the matching delimiter, which may be only one or a 
few characters long. When looking at a random line of source, it can be hard to 
tell at a glance whether it's code or literal. Syntax highlighting can help 
with these issues, but it's often unreliable, especially with advanced, 
idiosyncratic string literal features like multiline strings.

Continuation quotes solve these problems. To find the end of the literal, just 
scan down the column of continuation characters until they end. To figure out 
if a given line of source is part of a literal, just see if it starts with a 
quote mark. The meaning of the source becomes obvious at a glance.

Nevertheless, the traditional design does has a few advantages:

It is simpler. Although continuation quotes are more complex, we believe that 
the advantages listed above pay for that complexity.

There is no need to edit the intervening lines to add continuation quotes. 
While the additional effort required to insert continuation quotes is an 
important downside, we believe that tool support, including both compiler 
fix-its and perhaps editor support for commands like "Paste as String Literal", 
can address this issue. In some editors, new features aren't even necessary; 
TextMate, for instance, lets you insert a character on several lines 
simultaneously. And new tool features could also address other issues like 
escaping embedded quotes.

Naïve syntax highlighters may have trouble understanding this syntax. This is 
true, but naïve syntax highlighters generally have terrible trouble with 
advanced string literal constructs; some struggle with even basic ones. While 
there are some designs (like Python's """ strings) which trick some syntax 
highlighters into working some of the time with some contents, we don't think 
this occasional, accidental compatibility is a big enough gain to justify 
changing the design.

It looks funny—quotes should always be in matched pairs. We aren't aware of 
another programming language which uses unbalanced quotes in string literals, 
but there is one very important precedent for this kind of formatting: natural 
languages. English, for instance, uses a very similar format for quoting 
multiple lines of dialog by the same speaker. As an English Stack Exchange 
answer illustrates <http://english.stackexchange.com/a/96613/64636>:

“That seems like an odd way to use punctuation,” Tom said. “What harm would 
there be in using quotation marks at the end of every paragraph?”

“Oh, that’s not all that complicated,” J.R. answered. “If you closed quotes at 
the end of every paragraph, then you would need to reidentify the speaker with 
every subsequent paragraph.

“Say a narrative was describing two or three people engaged in a lengthy 
conversation. If you closed the quotation marks in the previous paragraph, then 
a reader wouldn’t be able to easily tell if the previous speaker was extending 
his point, or if someone else in the room had picked up the conversation. By 
leaving the previous paragraph’s quote unclosed, the reader knows that the 
previous speaker is still the one talking.”

“Oh, that makes sense. Thanks!”
In English, omitting the ending quotation mark tells the text's reader that the 
quote continues on the next line, while including a quotation mark at the 
beginning of the next line reminds the reader that they're in the middle of a 
quote.

Similarly, in this proposal, omitting the ending quotation mark tells the 
code's reader (and compiler) that the string literal continues on the next 
line, while including a quotation mark at the beginning of the next line 
reminds the reader (and compiler) that they're in the middle of a string 
literal.

On balance, we think continuation quotes are the best design for this problem.

 
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#detailed-design>Detailed
 design

When Swift is parsing a string literal and reaches the end of a line without 
finding a closing quote, it examines the next line, applying the following 
rules:

If the next line begins with whitespace followed by a continuation quote, then 
the string literal contains a newline followed by the contents of the string 
literal starting on that line. (This line may itself have no closing quote, in 
which case the same rules apply to the line which follows.)

If the next line contains anything else, Swift raises a syntax error for an 
unterminated string literal. 

The exact error messages and diagnostics provided are left to the implementers 
to determine, but we believe it should be possible to provide two fix-its which 
will help users learn the syntax and correct string literal mistakes:

Insert " at the end of the current line to terminate the quote.

Insert " at the beginning of the next line (with some indentation heuristics) 
to continue the quote on the next line.

 
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#impact-on-existing-code>Impact
 on existing code

Failing to close a string literal before the end of the line is currently a 
syntax error, so no valid Swift code should be affected by this change.

 
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#future-directions-for-multiline-string-literals>Future
 directions for multiline string literals

We could permit comments before encountering a continuation quote to be counted 
as whitespace, and permit empty lines in the middle of string literals. This 
would allow you to comment out whole lines in the literal.

We could allow you to put a trailing backslash on a line to indicate that the 
newline isn't "real" and should be omitted from the literal's contents.

 
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#future-directions-for-string-literals-in-general>Future
 directions for string literals in general

There are other issues with Swift's string handling which this proposal 
intentionally does not address:

Reducing the amount of double-backslashing needed when working with regular 
expression libraries, Windows paths, source code generation, and other tasks 
where backslashes are part of the data.

Alternate delimiters or other strategies for writing strings with " characters 
in them.

Accommodating code formatting concerns like hard wrapping and commenting.

String literals consisting of very long pieces of text which are best 
represented completely verbatim, with minimal alteration.

This section briefly outlines some future proposals which might address these 
issues. Combined, we believe they would address most of the string literal use 
cases which Swift is currently not very good at.

Please note that these are simply sketches of hypothetical future designs; they 
may radically change before proposal, and some may never be proposed at all. 
Many, perhaps most, will not be proposed for Swift 3. We are sketching these 
designs not to propose and refine these features immediately, but merely to 
show how we think they might be solved in ways which complement this proposal.

 
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#string-literal-modifiers>String
 literal modifiers

A string literal modifier is a cluster of identifier characters which goes 
before a string literal and adjusts the way it is parsed. Modifers only alter 
the interpretation of the text in the literal, not the type of data it 
produces; for instance, there will never be something like the 
UTF-8/UTF-16/UTF-32 literal modifiers in C++. Uppercase characters enable a 
feature; lowercase characters disable a feature.

Modifiers can be attached to both single-line and multiline literals, and could 
also be attached to other literal syntaxes which might be introduced in the 
future. When used with multiline strings, only the starting quote needs to 
carry the modifiers, not the continuation quotes.

Modifiers are an extremely flexible feature which can be used for many 
proposes. Of the ideas listed below, we believe the e modifier is an urgent 
addition which should be included in Swift 3 if at all possible; the others are 
less urgent and most of them could be deferred, or at least added later if time 
allows.

Escape disabling: e"\\\" (string with three backslash characters)

Fine-grained escape disabling: i"\(foo)\n" (the string \(foo) followed by a 
newline); eI"\(foo)\n" (the contents of foo followed by the string \n), 
b"\w+\n" (the string \w+ followed by a newline)

Alternate delimiters: _ has no lowercase form, so it could be used to allow 
strings with internal quotes: _"print("Hello, world!")"_, __"print("Hello, 
world!")"__, etc.

Whitespace normalization: changes all runs of whitespace in the literal to 
single space characters; this would allow you to use multiline strings purely 
to improve code formatting.

alert.informativeText =
    W"\(appName) could not typeset the element “\(title)” because 
     "it includes a link to an element that has been removed from this 
     "book."
Localization: 

alert.informativeText =
    LW"\(appName) could not typeset the element “\(title)” because 
      "it includes a link to an element that has been removed from this 
      "book."
Comments: Embedding comments in string literals might be useful for literals 
containing regular expressions or other code.

Eventually, user-specified string modifiers could be added to Swift, perhaps as 
part of a hygienic macro system. It might also become possible to change the 
default modifiers applied to literals in a particular file or scope.

 
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#heredocs-or-other-verbatim-string-literal-features>Heredocs
 or other "verbatim string literal" features

Sometimes it really is best to just splat something else down in the middle of 
a file full of Swift source code. Maybe the file is essentially a template and 
the literals are a majority of the code's contents, or maybe you're writing a 
code generator and just want to get string data into it with minimal fuss, or 
maybe people unfamiliar with Swift need to be able to edit the literals. 
Whatever the reason, the normal string literal syntax is just too burdensome.

One approach to this problem is heredocs. A heredoc allows you to put a 
placeholder for a literal on one line; the contents of the literal begin on the 
next line, running up to some delimiter. It would be possible to put multiple 
placeholders in a single line, and to apply string modifiers to them.

In Swift, this might look like:

print(#to("---") + e#to("END"))
It was a dark and stormy \(timeOfDay) when 
---
the Swift core team invented the \(interpolation) syntax.
END
Another possible approach would be to support traditional multiline string 
literals bounded by a different delimiter, like """. This might look like:

print("""
It was a dark and stormy \(timeOfDay) when 
""" + e"""
the Swift core team invented the \(interpolation) syntax.
""")
Although heredocs could make a good addition to Swift eventually, there are 
good reasons to defer them for now. Please see the "Alternatives considered" 
section for details.

 
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#first-class-regular-expressions>First-class
 regular expressions

Members of the core team are interested in regular expressions, but they don't 
want to just build a literal that wraps PCRE or libicu; rather, they aim to 
integrate regexes into the pattern matching system and give them a deep, Perl 
6-style rethink. This would be a major effort, far beyond the scope of Swift 3.

In the meantime, the e modifier and perhaps other string literal modifiers will 
make it easier to specify regular expressions in string literals for use with 
NSRegularExpression and other libraries accessible from Swift.

 
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#alternatives-considered>Alternatives
 considered

 
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#requiring-no-continuation-character>Requiring
 no continuation character

The main alternative is to not require a continuation quote, and simply extend 
the string literal from the starting quote to the ending quote, including all 
newlines between them. For example:

let xml = "<?xml version=\"1.0\"?>
<catalog>
    <book id=\"bk101\" empty=\"\">
        <author>\(author)</author>
    </book>
</catalog>"
This alternative is extensively discussed in the "Rationale" section above.

 
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#skip-multiline-strings-and-just-support-heredocs>Skip
 multiline strings and just support heredocs

There are definitely cases where a heredoc would be a better solution, such as 
generated code or code which is mostly literals with a little Swift sprinkled 
around. On the other hand, there are also cases where multiline strings are 
better: short strings in code which is meant to be read. If a single feature 
can't handle them both well, there's no shame in supporting the two features 
separately.

It makes sense to support multiline strings first because:

They extend existing syntax instead of introducing new syntax.

They are much easier to parse; heredocs require some kind of mode in the parser 
which kicks in at the start of the next line, whereas multiline string literals 
can be handled in the lexer.

As discussed in "Rationale", they offer better diagnostics, code formatting, 
and visual scannability.

 
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#use-a-different-delimiter-for-multiline-strings>Use
 a different delimiter for multiline strings

The initial suggestion was that multiline strings should use a different 
delimiter, """, at the beginning and end of the string, with no continuation 
characters between. Like heredocs, this might be a good alternative for certain 
use cases, but it has the same basic flaws as the "no continuation character" 
solution.

-- 
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to