Now that I had some sleep I actually revise my opinion about the last line. I 
took a few hours to sum my thoughts together in a gist, here is a formatted 
version of it: 
https://gist.github.com/DevAndArtist/dae76d4e3d4e49b1fab22ef7e86a87a9

Simple ‘multi-line string literal’ model

Core features:

Omitting of (most) backslashes for ".
Altering the string with implicit new line injection at the end of the line.
Consequences of #1:

To omit escaping the quote character, the delimiter characters for the 
multi-line string literal will be tripled quotes """, also similar to other 
programming languages.

When a standard string literal contains at least 5 quotes, then the usage of a 
multi-line string literal will be shorter.

"<a href=\"\(url)\" id=\"link\(i)\" class=\"link\">"    // With escapes
"""<a href="\(url)" id="link\(i)" class="link">"""      // With tripled literals
Consequences of #2:

To fully support this feature, we need to compromise the design for simplicity 
and intuitivity.

This feature needs precision for leading and trailing spaces.
Alternatively one would need a way to disable new line injection to also 
support code formatting.
Two ways of writing a multi-line string literal:

Single line version """abc""" is trivial and already was shown above.

The multi-line version comes with a few compromises for simplicity of rules:

"""   // DS (delimiter start)
foo   // s0
foo   // s1
foo   // s2
"""   // DE (delimiter end)
The string content is always written between the lines DS and DE (delimiter 
lines).

To not to go the continuation quotes path, the left (or leading) precision is 
handled by the closing delimiter (1. compromise). The closing delimiter is also 
responsible for the indent algorithm, which will calculate the stripping prefix 
in line DE and apply stripping to lines s0 to sn.

Right (or trailing) precision of each line from s0 to sn (notice n equals 2 in 
the example above) is handled by a backslash character (2. compromise).

The right precision comes at a price of losing the implicit new line injection, 
however this was also a requested feature (3. compromise). That means that the 
backslash serves two jobs simultaneously.

New line injection happens only on lines s0 to s(n - 1) (4. and last compromise 
of the design). The last line sn (or s2 above) does not inject a new line into 
the final string. This implies that in this line a backslash character handles 
only right precision, or one could say it’s reduced to one functionality.

Important:

Because whitespace is important to these examples, it is explicitly indicated: 
· is a space, ⇥ is a tab, and ↵ is a newline.

Leading/trailing precision and indent (1. and 2. compromise):

// Nothing to strip in this example (no ident).
let str_1 = """↵  
foo↵
"""

// No right precision (no backslash) -> whitespaces will be stripped.
let str_2 = """↵  
foo··↵
"""

// Same as `str_2`
let str_3 = """↵  
foo····↵
"""

// Line `DE` of the closing delimiter calculates the indent prefix  
// `··` and strips it from `s0` (left precision).
let str_4 = """↵  
··foo↵
··"""

// Line `DE` of the closing delimiter calculates the indent prefix  
// `····` and strips it from `s0` (left precision).
// No right precision (no backslash) -> whitespaces will be stripped.
let str_5 = """↵  
····foo··↵
····"""

// Line `DE` of the closing delimiter calculates the indent prefix  
// `⇥ ⇥ ` and strips it from `s0` (left precision).
// Right precision is applied (backslash). In this case the literal
// contains only a single line of content, which happens to be   
// also the last line before `DE` -> backslash only serves precision.
let str_6 = """↵  
⇥ ⇥ foo\↵
⇥ ⇥ """

// Line `DE` of the closing delimiter calculates the indent prefix  
// `·⇥ ·⇥ ` and strips it from `s0` (left precision).
// No right precision (no backslash) -> whitespaces will be stripped.
let str_7 = """↵  
·⇥ ·⇥ foo··↵
·⇥ ·⇥ """

let string_1 = "foo"

str_1 == string_1   // => true
str_2 == string_1   // => true
str_3 == string_1   // => true
str_4 == string_1   // => true
str_5 == string_1   // => true
str_6 == string_1   // => true
str_7 == string_1   // => true
A false multi-line string literal, which compiles but emits a warning and 
proves a fix-it:

let str_8 = """↵  
··foo↵
····"""

str_8 == string_1   // => true
warning: missing indentation in multi-line string literal
  ··foo!
    ^  
  Fix-it: Insert "··"
The stripping algorithm calculates the prefix indent from the closing delimiter 
line DE and tries to strip it in lines s0 to sn if possible, otherwise each 
line, which could not be handled correctly will emit an individual warning and 
a fix-it.

The stripping algorithm removes every whitespace on the end of each line from 
s0 to sn iff there is no right precision, annotated through a backslash like 
··foo··\↵. This behavior is essential and aligns well with the precision 
behavior of a standard string literal " ", otherwise a multi-line string 
literal like

"""
foo
"""
can contain 3 characters or 10 characters or even 1000 characters, but the 
developer couldn’t tell or even approximately guess.

The correct way of fixing this, as already mentioned above, is by striping all 
white spaces after the last non-space character of the line, unless the right 
precision is explicitly annotated with a backslash.

"""
foo   \
"""
Disabling new line injection (3. compromise):

The examples we’ve used so far had only a single content line, so we couldn’t 
showcase the behavior yet. New lines are only injected into a multi-line string 
if it has at least two content lines.

let str_9 = """↵  
····foo↵
····bar↵
····"""

let str_10 = """↵  
····foo↵
····bar↵
····baz↵
····"""

let string_2 = "foo\nbar"
let string_3 = "foor\nbar\nbaz"

str_9 == string_2  // => true
str_10 == string_3 // => true
To disable new line injection one would need to use the backslash for right 
precision.

let str_11 = """↵  
····foo\↵
····bar↵
····"""

let str_12 = """↵  
····foo\↵
····bar\↵
····baz↵
····"""

str_11 == string_2    // => false
str_12 == string_3    // => false

str_11 == "foorbar"   // => true
str_12 == "foobarbaz" // => true
Remember that the last content line sn does not automatically inject a new line 
into the final string!

New line injection except for the last line (4. compromise):

The standard string literal like "foo" only contains its string content from 
the starting delimiter to the closing delimiter. The discussion on the mailing 
list suggests that the multi-line string literal should also go that route and 
not inject a new line for the last content line sn. str_9 is a good example for 
that behavior.

Now if one would want a new line at the end of the string, there are a few 
options to achieve this:

// Natural way:
let str_13 = """↵  
····foo↵
····bar↵
····↵
····"""

// Remember the last content line does not inject a `\n` character by default
// so there is no need for `\n\` here (but it's possible as well)!
let str_14 = """↵  
····foo↵
····bar\n↵
····"""

str_13 == "foo\nbar\n" // => true
At first glance the behavior in str_13 seems odd and inconsistent, however it 
actually mimics perfectly the natural way of writing text paragraphs.

[here is a blank line]↵
text text text tex text↵
text text text tex text↵
[here is a blank line]
This is easily expressed with the literal model expressed above:

let myParagraph = """↵
····↵
····text text text tex text↵
····text text text tex text↵
····↵
····"""


-- 
Adrian Zubarev
Sent with Airmail

Am 13. April 2017 um 02:39:51, Xiaodi Wu (xiaodi...@gmail.com) schrieb:

On Wed, Apr 12, 2017 at 5:20 PM, Brent Royal-Gordon <br...@architechies.com> 
wrote:
Wow, maybe I shouldn't have slept.

Okay, let's deal with trailing newline first. I'm *very* confident that 
trailing newlines should be kept by default. This opinion comes from lots of 
practical experience with multiline string features in other languages. In 
practice, if you're generating files in a line-oriented way, you're usually 
generating them a line at a time. It's pretty rare that you want to generate 
half a line and then add more to it in another statement; it's more likely 
you'll interpolate the data. I'm not saying it doesn't happen, of course, but 
it happens a lot less often than you would think just sitting by the fire, 
drinking whiskey and musing over strings.

I know that, if you're pushing for this feature, it's not satisfying to have 
the answer be "trust me, it's not what you want". But trust me, it's not what 
you want.

This is not a very good argument. If you are generating files in a 
line-oriented way, it is the function _emitting_ the string that handles the 
line-orientedness, not the string itself. That is the example set by `print()`:

```
print("Hello, world!") // Emits "Hello, world!\n"
```

Once upon a time, if I recall, this function was called `println`, but it was 
renamed. This particular example demonstrates why keeping trailing newlines by 
default is misguided:

```
print(
  """
  Hello, world!
  """
)
```

Under your proposed rules, this emits "Hello, world!\n\n". It is almost 
certainly not what you want. Instead, it is a misguided attempt by the 
designers of multiline string syntax to do the job that the designers of 
`print()` have already accounted for.

If we were to buy your line of reasoning and adapt it for single-line strings, 
we would arrive at a rather absurd result. If you're emitting multiple 
single-line strings, you almost certainly want a space to separate them. Again 
this is exemplified by the behavior of `print()`:

```
print("Hello", "Brent!")
```

This emits "Hello Brent!" (and not "HelloBrent!"). But we do not take that 
reasoning and demand that "This is my string" end with an default trailing 
space, nor do we have `+` concatenate strings by default with a separating 
space.


Moving to the other end, I think we could do a leading newline strip *if* we're 
willing to create multiline and non-multiline modes—that is, newlines are _not 
allowed at all_ unless the opening delimiter ends its line and the closing 
delimiter starts its line (modulo indentation). But I'm reluctant to do that 
because, well, it's weird and complicated. I also get the feeling that, if 
there's a single-line mode and a multi-line mode, we ought to treat them as 
truly orthogonal features and allow `"`-delimited strings to use multi-line 
mode, but I'm really not convinced that's a good idea.

(Note, by the way, that heredocs—a *really* common multiline string 
design—always strip the leading newline but not the trailing one.)

Adrian cited this example, where I agree that you really don't want the string 
to be on the same line as the leading delimiter:

let myReallyLongXMLConstantName = """<?xml version="1.0"?>
                                     <catalog>
                                        <book id="bk101" empty="">
                                           <author>John Doe</author>
                                           <title>XML Developer's Guide</title>
                                           <genre>Computer</genre>
                                           <price>44.95</price>
                                        </book>
                                     </catalog>\
                                     """        

But there are lots of places where it works fine. Is there a good reason to 
force an additional newline in this?

case .isExprSameType(let from, let to):
return """checking a value with optional type \(from) against dynamic type 
\(to) \
      succeeds whenever the value is non-'nil'; did you mean to use '!= nil'?\
      """

I mean, we certainly could, but I'm not convinced we should. At least, not yet.

In any case, trailing newline definitely stays. Leading newline, I'm still 
thinking about.

As for other things:

* I see zero reason to fiddle with trailing whitespace. If it's there, it might 
be significant or it might not be. If we strip it by default and we shouldn't, 
the developer has no way to protect it. Let's trust the developer. (And their 
tooling—Xcode, I believe Git, and most linters already have trailing whitespace 
features. We don't need them too.)

* Somebody asked about `"""`-delimited heredocs. I think it's a pretty syntax, 
but it's not compatible with single-line use of `"""`, and I think that's 
probably more important. We can always add heredocs in another way if we decide 
we want them. (I think `#to(END)` is another very Swifty syntax we could use 
for heredocs--less lightweight, but it gives us a Google-able keyword.)

* Literal spaces and tabs cannot be backslashed. This is really important 
because, if you see a backslash after the last visible character in a line, you 
can't tell just by looking whether the next character is a space, tab, or 
newline. So the solution is, if it's not a newline, it's not valid at all.

I'll respond to Jarod separately.

On Apr 12, 2017, at 12:07 PM, John Holdsworth <m...@johnholdsworth.com> wrote:

Finally.. a new Xcode toolchain is available largely in sync with the proposal 
as is.
(You need to restart Xcode after selecting the toolchain to restart SourceKit)

I personally am undecided whether to remove the first line if it is empty. The 
new
rules are more consistent but somehow less practical. A blank initial line is 
almost
never what a user would want and I would tend towards removing it automatically.
This is almost what a user would it expect it to do.

I’m less sure the same applies to the trailing newline. If this is a syntax for
multi-line strings, I'd argue that they should normally be complete lines -
particularly since the final newline can so easily be escaped.

        let longstring = """\
            Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do 
eiusmod \
            tempor incididunt ut labore et dolore magna aliqua. Ut enim ad 
minim veniam, \
            quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea 
commodo consequat.\
            """

        print( """\
            Usage: myapp <options>
            
            Run myapp to do mything
            
            Options:
            -myoption - an option
            """ )

(An explicit “\n" in the string should never be stripped btw)

Can we have a straw poll for the three alternatives:

1) Proposal as it stands  - no magic removal of leading/training blank lines.
2) Removal of a leading blank line when indent stripping is being applied.
3) Removal of leading blank line and trailing newline when indent stripping is 
being applied.

My vote is for the pragmatic path: 2)

(The main intent of this revision was actually removing the link between how the
string started and whether indent stripping was applied which was unnecessary.)

On 12 Apr 2017, at 17:48, Xiaodi Wu via swift-evolution 
<swift-evolution@swift.org> wrote:

Agree. I prefer the new rules over the old, but considering common use cases, 
stripping the leading and trailing newline makes for a more pleasant experience 
than not stripping either of them.

I think that is generally worth prioritizing over a simpler algorithm or even 
accommodating more styles. Moreover, a user who wants a trailing or leading 
newline merely types an extra one if there is newline stripping, so no use 
cases are made difficult, only a very common one is made more ergonomic.


-- 
Brent Royal-Gordon
Architechies


_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to