On Wed, Jan 11, 2012 at 7:34 AM, Marco Casazza <marco.vince...@gmail.com> wrote:
> Hello,
>
> I've been slowly teaching myself python, using it for small projects when it
> seems appropriate. In this case, I was handed a list of email addresses for
> a mailing but some of them had been truncated. There are only 21 possible
> email "suffixes" so I planned to just identify which it should be and then
> replace it. However, when I started writing the code I realized that I'd be
> doing a lot of "repeating". Is there a better way to "fix" the suffixes
> without doing each individually? Here's my working code (for 4 colleges):
>
> import re
> with file('c:\python27\mvc\mailing_list.txt', 'r') as infile:
>    outlist = []
>    for line in infile.read().split('\n'):
>        if line.rstrip().lower().endswith('edu'):
>            newline = line + '\n'
>            outlist.append(newline.lower())
>        elif re.search("@bar", line):
>            newline = re.sub("@bar.*", "@baruch.cuny.edu", line)+'\n'
>            outlist.append(newline.lower())
>        elif re.search("@bcc", line):
>            newline = re.sub("@bcc.*", "@bcc.cuny.edu", line)+'\n'
>            outlist.append(newline.lower())
>        elif re.search("@bmc", line):
>            newline = re.sub("@bmc.*", "@bmcc.cuny.edu", line)+'\n'
>            outlist.append(newline.lower())
>        elif re.search("@leh", line):
>            newline = re.sub("@leh.*", "@lehman.cuny.edu", line)+'\n'
>            outlist.append(newline.lower())
>
> with file('c:\python27\mvc\output.txt','w') as outfile:
>    outfile.writelines(outlist)
>
> Thanks,
> Marco
> _______________________________________________
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor

First, look here about reading files:
http://docs.python.org/tutorial/inputoutput.html#methods-of-file-objects

I like this better:
    f = open('filename', 'r')
    for line in f:
        print line     # this will give you one line at a time without
the trailing newline

Second, make a dictionary of with the key being what comes after the @
in your truncated file.  The value will be the complete text you want:
 d = {"bcc" : "bcc.cuny.edu", etc. }

Third, use line.split('@') to split the line into what comes before
and after the @ sign.  It will return a list
    address_parts = line.split('@')

address_parts[0] is what you want to keep as is. I'm guessing that the
3 characters after the @ will be enough to identify what the full
address should look like, so
if address_parts[1][0:3] in d:
  result = '@'.join([address_parts[0], d[address_parts[1][0:3]])

write the result to your out file.

Its early in the morning for me, and this is untested, but it might
give you some ideas.

-- 
Joel Goldstick
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to