-------------------------------------------- On Tue, 4/14/15, Peter Otten <__pete...@web.de> wrote:
Subject: Re: [Tutor] Regular expression on python To: tutor@python.org Date: Tuesday, April 14, 2015, 4:37 PM Steven D'Aprano wrote: > On Tue, Apr 14, 2015 at 10:00:47AM +0200, Peter Otten wrote: >> Steven D'Aprano wrote: > >> > I swear that Perl has been a blight on an entire generation of >> > programmers. All they know is regular expressions, so they turn every >> > data processing problem into a regular expression. Or at least they >> > *try* to. As you have learned, regular expressions are hard to read, >> > hard to write, and hard to get correct. >> > >> > Let's write some Python code instead. > [...] > >> The tempter took posession of me and dictated: >> >> >>> pprint.pprint( >> ... [(k, int(v)) for k, v in >> ... re.compile(r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*").findall(line)]) >> [('Input Read Pairs', 2127436), >> ('Both Surviving', 1795091), >> ('Forward Only Surviving', 17315), >> ('Reverse Only Surviving', 6413), >> ('Dropped', 308617)] > > Nicely done :-) > Yes, nice, but why do you use re.compile(regex).findall(line) and not re.findall(regex, line) I know what re.compile is for. I often use it outside a loop and then actually use the compiled regex inside a loop, I just haven't see the way you use it before. > I didn't say that it *couldn't* be done with a regex. I didn't claim that. > Only that it is > harder to read, write, etc. Regexes are good tools, but they aren't the > only tool and as a beginner, which would you rather debug? The extract() > function I wrote, or r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*" ? I know a rhetorical question when I see one ;) > Oh, and for the record, your solution is roughly 4-5 times faster than > the extract() function on my computer. I wouldn't be bothered by that. See below if you are. > If I knew the requirements were > not likely to change (that is, the maintenance burden was likely to be > low), I'd be quite happy to use your regex solution in production code, > although I would probably want to write it out in verbose mode just in > case the requirements did change: > > > r"""(?x) (?# verbose mode) personally, I prefer to be verbose about being verbose, ie use the re.VERBOSE flag. But perhaps that's just a matter of taste. Are there any use cases when the ?iLmsux operators are clearly a better choice than the equivalent flag? For me, the mental burden of a regex is big enough already without these operators. > (.+?): (?# capture one or more character, followed by a colon) > \s+ (?# one or more whitespace) > (\d+) (?# capture one or more digits) > (?: (?# don't capture ... ) > \s+ (?# one or more whitespace) > \(.*?\) (?# anything inside round brackets) > )? (?# ... and optional) > \s* (?# ignore trailing spaces) > """ <snip> _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor