I'm parsing a utf-8 encoded file with lines characterized by placenames in
all caps thus:
HEREFORD, Herefordshire.
..other lines..
HÉRON (LE), Normandie.
..other lines..
I identify these lines for parsing using
for line in data:
if re.match(r'[A-Z]{2,}', line):
but of course this catches HEREFORD, but not HÉRON.
What sort of re test can I do to catch lines whose defining characteristic
is that they begin with two or more adjacent utf-8 encoded capital
letters?_______________________________________________
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor