I have the following mostly working function to strip the first 4
digit year out of some text. But a leading space confounds it for
years starting 20..:
import re
def getyear(text):
s = """(?:.*?(19\d\d)|(20\d\d).*?)"""
p = re.compile(s,re.IGNORECASE|re.DOTALL) #|re.VERBOSE
y = p.match(text)
try:
return y.group(1) or y.group(2)
except:
return ''
>>> getyear('2002')
'2002'
>>> getyear(' 2002')
''
>>> getyear(' 1902')
'1902'
A regex of ".*?" means any number of any characters, with a non-greedy
hunger (so to speak) right?
Any ideas on what is causing this to fail?
Many thanks in advance,
Thomas
_______________________________________________
Tutor maillist - [email protected]
http://mail.python.org/mailman/listinfo/tutor