Norman Khine wrote:
hello,

import re
line = "ALSACE 67000 Strasbourg 24 rue de la Division Leclerc 03 88 23 05 66 
strasbo...@artisansdumonde.org"
m = re.search('[\w\-][\w\-\...@[\w\-][\w\-\.]+[a-za-z]{1,4}', line)
emailAddress .search(r"(\d+)", line)
phoneNumber = re.compile(r'(\d{2}) (\d{2}) (\d{2}) (\d{2}) (\d{2})')
phoneNumber.search(line)

but this jumbles the phone number and also includes the 67000.

how can i split the 'line' into a list?

thanks
norman

lst = line.split()    will split the line strictly by whitespace.

Before you can write code to parse a line, you have to know for sure the syntax of that line. This particular one has 15 fields, delimited by spaces. So you can parse it with str.split(), and use slices to get the particular set of numbers representing the phone number. (elements 9-14)

If the address portion might be a variable number of words, then you could still use split and slice, but use negative slice parameters to get the phone number relative to the end. (elements -6 to -2)

If the email address might have a space within it, then you have to get fancier.

If the phone number might have more or less than 5 "words", you have to get fancier.

Without a spec, all the regular expressions in the world are just noise.

DaveA

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to