On Wed, Jun 1, 2011 at 4:03 PM, Bill Janssen <jans...@parc.com> wrote: > I have a large RE (223613 chars) that works fine in CPython 2.6, but
That's truly horrible, but I assume you have a good reason for it. > seems to produce an endless loop in IronPython (see below). I'm using > Mono 2.10 (.NET 4.0.x) on Ubuntu, with IronPython 2.7. Anyone have > pointers to the differences between them? Is > System::Text::RegularExpressions in .NET configurable in some fashion > that might help? First off, is there a reason you don't use re.IGNORECASE? That would cut the regex in half, at least. For the most part, CPython and IronPython regexes should be fairly compatible - IronPython takes the regex and massages it to work with System.Text.RE, but the changes are pretty straightforward and small, and I don't think the re you provided hits any of them. It's quite possible that the Mono version of System.Text.RE can't handle the expression; you could test this saving the full regex and building a small C# program that runs it. The regex template has a lot of potential backtracking in it; are you sure it's not caught in a pathological (exponential) case? Finally, is one ginormous really the best way to do this? Have you tried other approaches? - Jeff > > I'm a .NET newbie. > > TIA, > > Bill > > -------------------------------------------------- > import sys, os, re > > try: > # we use the name lists in nltk to create person-name matching patterns > import nltk.data > except ImportError: > sys.stderr.write("Can't import nltk; can't do name lists.\nSee > http://www.nltk.org/.\n") > sys.exit(1) > else: > __MALE_NAME_EXCLUDES = ("Hill", > "Ave", > ) > __FEMALE_NAME_EXCLUDES = () > __FEMALE_NAMES = [x for x in > nltk.data.load("corpora/names/female.txt", > format="raw").split("\n") > if (x and (x not in __FEMALE_NAME_EXCLUDES))] > __FEMALE_NAMES += [x.upper() for x in __FEMALE_NAMES] > __MALE_NAMES = [x for x in > nltk.data.load("corpora/names/male.txt", > format="raw").split("\n") > if (x and (x not in __MALE_NAME_EXCLUDES))] > __MALE_NAMES += [x.upper() for x in __MALE_NAMES] > __INITS = [chr(x) for x in range(ord('A'), ord('Z'))] > > PERSON_PATTERN = re.compile( > "^((?P<honorific>Mr|Ms|Mrs|Dr|MR|MS|MRS|DR)\.? )?" # honorific > "(?P<firstname>" + > "|".join(__FEMALE_NAMES + __MALE_NAMES + __INITS) + # first name > ")" > "( (?P<middlename>([A-Z]\.)|(" + > "|".join(__FEMALE_NAMES + __MALE_NAMES) + # middle initial or name > ")))?" > " +(?P<lastname>[A-Z][A-Za-z]+)", # space then last name > re.MULTILINE) > > print PERSON_PATTERN.match("Mr. John Smith") > _______________________________________________ > Users mailing list > Users@lists.ironpython.com > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com > _______________________________________________ Users mailing list Users@lists.ironpython.com http://lists.ironpython.com/listinfo.cgi/users-ironpython.com