For some reason I didn't get this email, found it in the archives. I wanted to make sure I thanked Wayne for the help!!!



On Tue, Jan 17, 2012 at 3:07 AM, Chris Kavanagh <[hidden email]> wrote:
Hey guys, girls, hope everyone is doing well.

Here's my question, when using Regular Expressions, the docs say when using parenthesis, it "captures" the data. This has got me confused (doesn't take much), can someone explain this to me, please??

Here's an example to use. It's kinda long, so, if you'd rather provide your own shorter ex, that'd be fine. Thanks for any help as always.

Here's a quick example:

import re

data = 'Wayne Werner fake-phone: 501-555-1234, fake-SSN: 123-12-1234'
parsed = re.search('([\d]{3})-([\d]{3}-[\d]{4})', data)
print(parsed.group())
print(parsed.groups())

parsed = re.search('[\d]{3}-[\d]{3}-[\d]{4}', data)
print(parsed.group())
print(parsed.groups())

You'll notice that you can access the individual clusters using the .groups() method. This makes capturing the individual groups pretty easy. Of course, capturing isn't just for storing the results. You can also use the captured group later on.

Let's say, for some fictitious reason you want to find every letter that appears as a double in some data. If you were to do this the "brute force" way you'd pretty much have to do something like this:

for i in range(len(data)-1):
   found = []
   if data[i] == data[i+1]:
      if not data[i] in found:
        found.append(i)
   print(found)

The regex OTOH looks like this:

In [29]: data = 'aaabababbcacacceadbacdb'

In [32]: parsed = re.findall(r'([a-z])\1', data)

In [33]: parsed
Out[33]: ['a', 'b', 'c']

Now, that example was super contrived and also simple. Very few real-world applications will be as simple as that one - usually you have much crazier specifications, like find every person who has blue eyes AND blue hair, but only if they're left handed. Assuming you had data that looked like this:

Name    Eye Color    Hair Color   Handedness     Favorite type of potato
Wayne    Blue             Brown            Dexter             Mashed
Sarah      Blue             Blonde           Sinister            Spam(?)
Kane       Green          White             Dexter             None
Kermit     Blue             Blue               Sinister            Idaho


You could parse out the data using captures and backrefrences [1].

HTH,
Wayne

[1] In this situation, of course, regex is overkill. It's easier to just .split() and compare. But if you're parsing something really nasty like EDI then sometimes a regex is just the best way to go[2].

[2] When people start to understand regexes they're like the proverbial man who only has a hammer. As Jamie Zawinski said[3], "Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems." I've come across very few occasions that regexes were actually useful, and it's usually extracting very specifically formatted data (money, phone numbers, etc.) from copious amounts of text. I've not yet had a need to actually process words with it. Especially using Python.

[3]http://regex.info/blog/2006-09-15/247

_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Thanks again Wayne.
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to