Re: [Tutor] using re to match text and extract info

Norman Khine Fri, 01 Jan 2010 03:16:39 -0800

Thank you for the replies and Happy New Year!

On Thu, Dec 31, 2009 at 7:19 PM, Dave Angel <da...@ieee.org> wrote:
> Norman Khine wrote:
>>
>> hello,
>>
>>
>>>>>
>>>>> import re
>>>>> line = "ALSACE 67000 Strasbourg 24 rue de la Division Leclerc 03 88 23
>>>>> 05 66 strasbo...@artisansdumonde.org"
>>>>> m = re.search('[\w\-][\w\-\...@[\w\-][\w\-\.]+[a-za-z]{1,4}', line)
>>>>> emailAddress .search(r"(\d+)", line)
>>>>> phoneNumber = re.compile(r'(\d{2}) (\d{2}) (\d{2}) (\d{2}) (\d{2})')
>>>>> phoneNumber.search(line)
>>>>>
>>
>> but this jumbles the phone number and also includes the 67000.
>>
>> how can i split the 'line' into a list?
>>
>> thanks
>> norman
>>
>>
>
> lst = line.split()    will split the line strictly by whitespace.
>
> Before you can write code to parse a line, you have to know for sure the
> syntax of that line.  This particular one has 15 fields, delimited by
> spaces.  So you can parse it with str.split(), and use slices to get the
> particular set of numbers representing the phone number.  (elements 9-14)
>
> If the address portion might be a variable number of words, then you could
> still use split and slice, but use negative slice parameters to get the
> phone number relative to the end. (elements -6 to -2)
>
> If the email address might have a space within it, then you have to get
> fancier.
>
> If the phone number might have more or less than 5 "words", you have to get
> fancier.
>
> Without a spec, all the regular expressions in the world are just noise.
>
> DaveA
>
>




-- 
%>>> "".join( [ {'*':'@','^':'.'}.get(c,None) or
chr(97+(ord(c)-83)%26) for c in ",adym,*)&uzq^zqf" ] )
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] using re to match text and extract info

Reply via email to