[Tutor] To: Wayne Werner , re Reg. Expressions Parenthesis

Chris Kavanagh Wed, 18 Jan 2012 16:12:30 -0800

For some reason I didn't get this email, found it in the archives. Iwanted to make sure I thanked Wayne for the help!!!




On Tue, Jan 17, 2012 at 3:07 AM, Chris Kavanagh <[hidden email]> wrote:
Hey guys, girls, hope everyone is doing well.

Here's my question, when using Regular Expressions, the docs say whenusing parenthesis, it "captures" the data. This has got me confused(doesn't take much), can someone explain this to me, please??

Here's an example to use. It's kinda long, so, if you'd rather provideyour own shorter ex, that'd be fine. Thanks for any help as always.


Here's a quick example:

import re

data = 'Wayne Werner fake-phone: 501-555-1234, fake-SSN: 123-12-1234'
parsed = re.search('([\d]{3})-([\d]{3}-[\d]{4})', data)
print(parsed.group())
print(parsed.groups())

parsed = re.search('[\d]{3}-[\d]{3}-[\d]{4}', data)
print(parsed.group())
print(parsed.groups())

You'll notice that you can access the individual clusters using the.groups() method. This makes capturing the individual groups prettyeasy. Of course, capturing isn't just for storing the results. You canalso use the captured group later on.

Let's say, for some fictitious reason you want to find every letter thatappears as a double in some data. If you were to do this the "bruteforce" way you'd pretty much have to do something like this:


for i in range(len(data)-1):
   found = []
   if data[i] == data[i+1]:
      if not data[i] in found:
        found.append(i)
   print(found)

The regex OTOH looks like this:

In [29]: data = 'aaabababbcacacceadbacdb'

In [32]: parsed = re.findall(r'([a-z])\1', data)

In [33]: parsed
Out[33]: ['a', 'b', 'c']

Now, that example was super contrived and also simple. Very fewreal-world applications will be as simple as that one - usually you havemuch crazier specifications, like find every person who has blue eyesAND blue hair, but only if they're left handed. Assuming you had datathat looked like this:


Name    Eye Color    Hair Color   Handedness     Favorite type of potato
Wayne    Blue             Brown            Dexter             Mashed
Sarah      Blue             Blonde           Sinister            Spam(?)
Kane       Green          White             Dexter             None
Kermit     Blue             Blue               Sinister            Idaho


You could parse out the data using captures and backrefrences [1].

HTH,
Wayne

[1] In this situation, of course, regex is overkill. It's easier to just.split() and compare. But if you're parsing something really nasty likeEDI then sometimes a regex is just the best way to go[2].

[2] When people start to understand regexes they're like the proverbialman who only has a hammer. As Jamie Zawinski said[3], "Some people, whenconfronted with a problem, think“I know, I'll use regular expressions.” Now they have two problems."I've come across very few occasions that regexes were actually useful,and it's usually extracting very specifically formatted data (money,phone numbers, etc.) from copious amounts of text. I've not yet had aneed to actually process words with it. Especially using Python.


[3]http://regex.info/blog/2006-09-15/247

_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Thanks again Wayne.
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] To: Wayne Werner , re Reg. Expressions Parenthesis

Reply via email to