Thank you for the replies and Happy New Year! On Thu, Dec 31, 2009 at 7:19 PM, Dave Angel <da...@ieee.org> wrote: > Norman Khine wrote: >> >> hello, >> >> >>>>> >>>>> import re >>>>> line = "ALSACE 67000 Strasbourg 24 rue de la Division Leclerc 03 88 23 >>>>> 05 66 strasbo...@artisansdumonde.org" >>>>> m = re.search('[\w\-][\w\-\...@[\w\-][\w\-\.]+[a-za-z]{1,4}', line) >>>>> emailAddress .search(r"(\d+)", line) >>>>> phoneNumber = re.compile(r'(\d{2}) (\d{2}) (\d{2}) (\d{2}) (\d{2})') >>>>> phoneNumber.search(line) >>>>> >> >> but this jumbles the phone number and also includes the 67000. >> >> how can i split the 'line' into a list? >> >> thanks >> norman >> >> > > lst = line.split() will split the line strictly by whitespace. > > Before you can write code to parse a line, you have to know for sure the > syntax of that line. This particular one has 15 fields, delimited by > spaces. So you can parse it with str.split(), and use slices to get the > particular set of numbers representing the phone number. (elements 9-14) > > If the address portion might be a variable number of words, then you could > still use split and slice, but use negative slice parameters to get the > phone number relative to the end. (elements -6 to -2) > > If the email address might have a space within it, then you have to get > fancier. > > If the phone number might have more or less than 5 "words", you have to get > fancier. > > Without a spec, all the regular expressions in the world are just noise. > > DaveA > >
-- %>>> "".join( [ {'*':'@','^':'.'}.get(c,None) or chr(97+(ord(c)-83)%26) for c in ",adym,*)&uzq^zqf" ] ) _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor