Norman Khine wrote:
hello,
import re
line = "ALSACE 67000 Strasbourg 24 rue de la Division Leclerc 03 88 23 05 66
strasbo...@artisansdumonde.org"
m = re.search('[\w\-][\w\-\...@[\w\-][\w\-\.]+[a-za-z]{1,4}', line)
emailAddress .search(r"(\d+)", line)
phoneNumber = re.compile(r'(\d{2}) (\d{2}) (\d{2}) (\d{2}) (\d{2})')
phoneNumber.search(line)
but this jumbles the phone number and also includes the 67000.
how can i split the 'line' into a list?
thanks
norman
lst = line.split() will split the line strictly by whitespace.
Before you can write code to parse a line, you have to know for sure the
syntax of that line. This particular one has 15 fields, delimited by
spaces. So you can parse it with str.split(), and use slices to get the
particular set of numbers representing the phone number. (elements 9-14)
If the address portion might be a variable number of words, then you
could still use split and slice, but use negative slice parameters to
get the phone number relative to the end. (elements -6 to -2)
If the email address might have a space within it, then you have to get
fancier.
If the phone number might have more or less than 5 "words", you have to
get fancier.
Without a spec, all the regular expressions in the world are just noise.
DaveA
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor