[issue16563] re.match loops forever on simple regexp

2012-11-26 Thread L. Peter Deutsch
New submission from L. Peter Deutsch: I've read a number of reports of exponential-time regexp matching, but this regexp uses no unusual features, requires no backtracking, and only loops "forever" on certain input strings. I listed the Python version # as 2.6; I actually observed the behavior

[issue16563] re.match loops forever on simple regexp

2012-11-26 Thread Tim Peters
Tim Peters added the comment: There's actually enormous backtracking here. Try this much shorter regexp and you'll see much the same behavior: re_utf8 = r'^([\x00-\x7f]+)*$' That's the original re_utf8 with all but the first alternative removed. Looks like passing s[0:34] "works" because it

[issue16563] re.match loops forever on simple regexp

2012-11-26 Thread Ezio Melotti
Ezio Melotti added the comment: I think the problem is the first '+', but I'm not sure if this is similar to other problems (I remember something similar to (x+)* causing problems). (On a side note: the regex matches non-valid utf-8, see table 3-7 on http://www.unicode.org/versions/Unicode6.0.

[issue16563] re.match loops forever on simple regexp

2012-11-26 Thread Tim Peters
Tim Peters added the comment: Yes, if you remove the first "+", the example quickly prints None (i.e., reports that the regexp cannot match the string). -- ___ Python tracker __

[issue16563] re.match loops forever on simple regexp

2012-11-26 Thread L. Peter Deutsch
L. Peter Deutsch added the comment: It never occurred to me that the regexp package would be so poorly designed that a pattern that so clearly never requires backtracking could require exponential time. I'll change the pattern (taking out the + has no effect on what strings it matches) and lea

[issue16563] re.match loops forever on simple regexp

2012-11-27 Thread Mark Dickinson
Mark Dickinson added the comment: Closing as a duplicate of issue 1662581. -- nosy: +mark.dickinson resolution: -> duplicate status: open -> closed superseder: -> the re module can perform poorly: O(2**n) versus O(n**2) ___ Python tracker

[issue16563] re.match loops forever on simple regexp

2012-11-27 Thread Mark Dickinson
Mark Dickinson added the comment: @lpd: you may want to look at the 'regex' module on PyPI [1] to see if it solves your problems. The idea is that some form of this should eventually go into the standard library, but we've been lacking volunteers for the huge code review task involved. [1]