I have the following mostly working function to strip the first 4 digit year out of some text. But a leading space confounds it for years starting 20..:
import re def getyear(text): s = """(?:.*?(19\d\d)|(20\d\d).*?)""" p = re.compile(s,re.IGNORECASE|re.DOTALL) #|re.VERBOSE y = p.match(text) try: return y.group(1) or y.group(2) except: return '' >>> getyear('2002') '2002' >>> getyear(' 2002') '' >>> getyear(' 1902') '1902' A regex of ".*?" means any number of any characters, with a non-greedy hunger (so to speak) right? Any ideas on what is causing this to fail? Many thanks in advance, Thomas _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor