> On Jan 25, 2017, at 12:23 PM, Joe Groff via swift-evolution > <swift-evolution@swift.org> wrote: > > >> On Jan 24, 2017, at 9:35 PM, Chris Lattner via swift-evolution >> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote: >> >> On Jan 24, 2017, at 12:05 AM, Chris Eidhof via swift-evolution >> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote: >>> >>> I agree that being able to implement parsers in a nice way can be a huge >>> step forward in being really good at string processing. >> >> +1 from me as well, I agree with Joe that Swift can learn a lot from Perl 6 >> grammar’s and we should take the time to do it right. Below I say “regex” a >> lot, but I really mean a more general grammar system (and even Perl 5 >> regex’s aren’t regular :-) >> >>> There are a couple of possibilities that come to mind directly: >>> >>> 1. Build parsers right into the language (like Perl 6 grammars) >>> 2. Provide a parser combinator language (e.g. >>> https://github.com/davedufresne/SwiftParsec >>> <https://github.com/davedufresne/SwiftParsec>). >>> 3. Rely on external tools like bison/yacc/etc. >>> 4. Make it easy for people to write hand-written parsers (e.g. by providing >>> an NSScanner alternative). >> >> >> My opinion is that #1 is the right path to start with, but it wouldn’t >> preclude doing #2. Here’s my rationale / half-baked thought process: >> >> There are two important use cases for regex's: the literal case (e.g. >> /aa+b*/) and the dynamically computed case. The former is really what we’re >> talking about here, the latter should obviously be handled with some sort of >> Regex type which can be formed from string values or whatever. Regex >> literals in an expression context should default to producing the Regex type >> of course. >> >> This means that when you pass a regex literal into an API call (e.g. split >> on a string), it is really just creating something of Regex type, and >> passing it down. If you wanted to introduce a parser combinator DSL, you >> could totally plug it into the system, by having the combinators produce >> something of the Regex type. >> >> So why bless regex literals with language support at all? I see several >> reasons: >> >> 1. Diagnostics: These will be heavily used by people, and you want to have >> good compiler error and warning messages for them. You want to be able to >> validate the regex at compile time, not wait until runtime to detect >> syntactic mistakes like unbalanced parens. >> >> 2. Syntax Familiarity: To take advantage of people’s familiarity with other >> languages, we should strive to make the basic regex syntax familiar and >> obvious. I’d argue that /aa+b*/ should “just work” and do the thing you >> think it does. Relying on a combinator library to do that would be crazy. >> >> 3. Performance: Many regex’s are actually regular, so they can be trivially >> compiled into DFAs. There is a well understood body of work that can be >> simply dropped into the compiler to do this. Regex’s that are not regular >> can be compiled into hybrid DFA/NFA+backtracking schemes, and allowing a >> divide and conquer style of compiler optimization to do this is the path >> that makes the most sense (to me at least). Further, if you switch on a >> string and have a bunch of cases that are regex’s, you’d obviously want the >> compiler to generate a single state machine (like a lexer), not check each >> pattern in series. >> >> 4. Pattern matching greatness: One of the most obnoxious/error prone aspects >> of regex’s in many languages is that when you match a pattern, the various >> matches are dumped into numbered result values (often by the order of the >> parens in the pattern). This is totally barbaric: it begs for off by one >> errors, often breaks as the program is being evolved/maintained, etc. It is >> just as bad as printf/scanf! >> >> You should instead be able to directly bind subexpressions into local >> variables. For example if you were trying to match something like “42: >> Chris”, you should be able to use straw man syntax like this: >> >> case /(let id: \d+): (let name: \w+)/: print(id); print(name) >> >> Unless we were willing to dramatically expand how patterns work, this >> requires baking support into the language. >> >> 5. Scanner/“Formatter" integration: Taking the above one step farther, we >> could have default patterns for known types (and make it extensible to user >> defined types of course). For example, \d+ is the obvious pattern for >> integers, so you should be able to write the above like this (in principle): >> >> case /(let id: Int): (let name: \w+)/: print(id); print(name) >> >> In addition to avoiding having to specify \d+ all the time, this eliminates >> the need for a “string to int” conversion after the pattern is matched, >> because id would be bound as type Int already. >> >> >> Anyway, to summarize, I think that getting regex’s into the language is >> really important and expect them to be widely used. As such, I think it is >> worth burning compiler/language complexity to make them be truly great in >> Swift. > > Another good reason to bake them into pattern matching is that it would make > it easier to optimize when you want to match one of multiple patterns. Often, > you don't want just one grammar, but possibly one of many, and it'd be nice > if switch-ing over multiple string patterns led to a reasonably efficient > DFA/NFA/rec-descent machine based on the needs of the grammars being matched.
+1 to the comments by Chris and Joe. We should do as much as we can in the library, but compile-time error detection, optimizability and syntactic convenience are important reasons to bake some support into the language itself. > > -Joe > > _______________________________________________ > swift-evolution mailing list > swift-evolution@swift.org > https://lists.swift.org/mailman/listinfo/swift-evolution
_______________________________________________ swift-evolution mailing list swift-evolution@swift.org https://lists.swift.org/mailman/listinfo/swift-evolution