On Nov 23, 2017, at 10:35 AM, Xiaodi Wu via swift-evolution 
<swift-evolution@swift.org> wrote:
> This proposed addition addresses a known pain point, to be sure, but I think 
> it has many implications for the future direction of the language and I'd 
> like to explore them here.

Thanks for writing this up Xiaodi,

> We should certainly move any discussion about regex literals into its own 
> thread, but to make it clear that I'm not simply suggesting that we implement 
> something in Swift 10 instead of addressing a known pain point now, here's a 
> sketch of how Swift 5 could make meaningful progress:
> 
> - Teach the lexer about basic /pattern/flag syntax.
> - Add an `ExpressibleByRegularExpressionLiteral`, where the initializer would 
> be something like `init(regularExpressionLiteralPattern: String, flags: 
> RegularExpressionFlags)` where RegularExpressionFlags would be an OptionSet 
> type.
> - Add conformance to `ExpressibleByRegularExpressionLiteral` to 
> `NSRegularExpression`.
> - Have no default `RegularExpressionLiteralType` for now so that, in the 
> future, we can discuss and design a Swift standard library regular expression 
> type, which is justifiable because we've baked in language support for the 
> literal. This can be postponed.

This approach could make sense, but it makes a couple of assumptions that I’m 
not certain are the right way to go (to be clear, I’m not certain that they’re 
wrong either!).

Things I’d like to carefully consider:

1) We could make the compiler parse and validate regex literals at compile time:

a) this allows the compiler to emit diagnostics (with fixits!) on malformed 
literals.  

b) When the compiler knows the grammar of the regex, it can precompile the 
regex into a DFA table or static executable code, rather than runtime compiling 
into a bytecode.

c) however, the compiler can’t parse the literal unless it knows the dialect it 
corresponds to.  While we could parameterize this somehow (e.g. as a 
requirement in ExpressibleByRegularExpressionLiteral), if we weren’t bound by 
backwards compatibility, we would just keep things simple and say “there is one 
and only one grammar”.  I’d argue that having exactly one grammar supported by 
the // syntax is also *better* for users, rather than saying “it depends on 
what library you’re passing the regex into”.


2) I’d like to explore the idea of making // syntax be *patterns* instead of 
simply literals.  As a pattern, it should be possible to bind submatches 
directly into variable declarations, eliminating the need to count parens in 
matches or other gross things.  Here is strawman syntax with a dumb example:

if case /([a-zA-Z]+: let firstName) ([a-zA-Z]+: let lastName)/ = 
getSomeString() {
   print(firstName, lastName)
}

3) I see regex string matching as the dual to string interpolation.  We already 
provide the ability for types to specify a default way to print themselves, and 
it would be great to have default regex’s associated with many types, so you 
can just say “match an Int here” instead of having to match [0-9]+ and then do 
a failable conversion to Int outside the regex.


4) I’d like to consider some of the advances that Perl 6 added to its regex 
grammar.  Everyone knows that modern regex’s aren’t actually regular anyway, so 
it begs the question of how far to take it.  If nothing else, I appreciate the 
freeform structure supported (including inline comments) which make them more 
readable.

We should also support a dynamic regex engine as well, because there are 
sometimes reasons to runtime construct regex’s.  This could be handled by 
having the Regex type support a conversion from String or something, orthogonal 
to the language support for regex literals/patterns.

-Chris

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to