On Wed, 22 Jun 2005, Shidan wrote:
> Hi I have a list of regular expression patterns like such: > > thelist = ['^594694.*','^689.*','^241.*', > '^241(0[3-9]|1[0145]|2[0-9]|3[0-9]|41|5[1-37]|6[138]|75|8[014579]).*'] > > > Now I want to iterate thru each of these like: > > for pattern in thelist: > regex=re.compile(pattern) > if regex.match('24110'): > the_pattern = pattern > . > . > sys.exit(0) > > but in this case it will pick thelist[2] and not the list[3] as I wanted > to, how can I have it pick the pattern that describes it better from the > list. Hi Shidan, Regular expressions don't have a concept of "better match": a regular expression either matches a pattern or it doesn't. It's binary: there's no concept of the "specificity" of a regular expression match unless you can define one yourself. Intuitively, it sounds like you're considering anything that uses a wildcard to be less match-worthy than something that uses simpler things like a character set. Does that sound right to you? If so, then perhaps we can write a function that calculates the "specificity" of a regular expression, so that 'theList[2]' scores less highly than 'theList[3]'. You can then see which regular expressions match your string, and then rank them in terms of specificity. But it's important to realize that what you've asked is actually a subjective measure of "best match", and so we have to define specifically what "best" means to us. (Other people might consider short regular expressions to be better because they're shorter and easier to read!) Tell us more about the problem, and we'll do what we can to help. Best of wishes! _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor