Can we try to enumerate the potential hazards and potential useful features 
associated with multi-line strings?  Then perhaps you can judge various 
proposals based on them.


Potential hazards:

H1) Forgotten '+' (plus).  This affects the current idiom where you could end 
up with part of your intended string going missing (with a compiler warning) 
because you forgot to put + between string fragments on adjacent lines.  
(Making this a compiler error was rejected in thread "Disallowing many 
expressions with unused result".)

H2) Forgotten ',' (comma).  If adjacent string tokens (potentially on separate 
lines) are implicitly concatenated (as in C/C++), that makes it easy to 
mis-specify arrays of strings, as in
var a = [
  "a",
  "b"
  "c"]
which would have only two elements.  This could also affect intended tuples, 
but with much higher likelihood of being caught by the compiler.

H3) No recovery for tokenization / syntax highlighting.  IMHO, this is the big 
drawback of Python-style """ strings.  If you jump to an arbitrary point in the 
source code, you don't know whether you're inside a """, and AFAIK there's no 
reliable, automatic way to figure out if the next """ enters or exits a 
multi-line string.  As someone who has dealt with building syntax highlighters, 
the property of a predictable tokenizer state after newline (as in languages 
like Java or C# - either default state or multiline comment) is really nice.  
Yes, multiline comments are kind of ugly, but they at least tend to be 
self-correcting because the entry and exit character sequences are different!  
Requiring some kind of continuation character generally facilitates immediate 
recovery.

H4) Unclear escaping / newline / indentation semantics.  This has been under 
heavy discussion and I don't have much to add.


Potentially useful features:

F1) Interpolation.  There's less value if it's difficult to embed evaluated 
code.

F2) Raw embedding.  There's less value if it's difficult to construct literals 
from raw strings, because of the need for escaping / continuation characters 
etc.


Another imperfect proposal:

Support two forms of multiline strings:
* one with escaping, interpolation, and embedded newlines only with \n, 
delimited with \\\ and ///
* one with no escaping that includes written newlines in the string, delimited 
with ``` and '''
Both forms would be subject to further restrictions to aid readability and use 
of indentation:
* The three-character begin or end delimiter must be on a line by itself, only 
with optional leading whitespace (spaces and/or tabs).
* Each line up to and including the end delimiter must exactly copy the leading 
whitespace used for the begin delimiter, and that whitespace is not included in 
the contents of the parsed string literal.

For example:
    var x =
        ```
        <?xml version="1.0">
        <path>
          C:\Foo
        </path>
        '''
Or:
    send(
        \\\
        <?xml version=\"1.0\">\n
        <path>\n  \(path)\n</path>\n
        ///
    )

Or:
    let f =
```
<paste in just about any file here>
'''

Maybe there should be a way to omit the trailing newline from a ``` ''' string, 
but I don't have a specific proposal.


-- 
Peter Dillinger, Ph.D.
Software Engineering Manager, Coverity Analysis, Software Integrity Group | 
Synopsys
www.synopsys.com/software
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to