Re: [Tutor] using re to match text and extract info

Dave Angel Thu, 31 Dec 2009 10:21:07 -0800

Norman Khine wrote:

hello,

import re
line = "ALSACE 67000 Strasbourg 24 rue de la Division Leclerc 03 88 23 05 66 
strasbo...@artisansdumonde.org"
m = re.search('[\w\-][\w\-\...@[\w\-][\w\-\.]+[a-za-z]{1,4}', line)
emailAddress .search(r"(\d+)", line)
phoneNumber = re.compile(r'(\d{2}) (\d{2}) (\d{2}) (\d{2}) (\d{2})')
phoneNumber.search(line)


but this jumbles the phone number and also includes the 67000.

how can i split the 'line' into a list?

thanks
norman

lst = line.split()    will split the line strictly by whitespace.

Before you can write code to parse a line, you have to know for sure thesyntax of that line. This particular one has 15 fields, delimited byspaces. So you can parse it with str.split(), and use slices to get theparticular set of numbers representing the phone number. (elements 9-14)

If the address portion might be a variable number of words, then youcould still use split and slice, but use negative slice parameters toget the phone number relative to the end. (elements -6 to -2)

If the email address might have a space within it, then you have to getfancier.

If the phone number might have more or less than 5 "words", you have toget fancier.


Without a spec, all the regular expressions in the world are just noise.

DaveA

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] using re to match text and extract info

Reply via email to