On 2012-01-11 07:57, Joel Goldstick wrote:
On Wed, Jan 11, 2012 at 7:34 AM, Marco Casazza<marco.vince...@gmail.com>  wrote:
Hello,

I've been slowly teaching myself python, using it for small projects when it
seems appropriate. In this case, I was handed a list of email addresses for
a mailing but some of them had been truncated. There are only 21 possible
email "suffixes" so I planned to just identify which it should be and then
replace it. However, when I started writing the code I realized that I'd be
doing a lot of "repeating". Is there a better way to "fix" the suffixes
without doing each individually? Here's my working code (for 4 colleges):

import re
with file('c:\python27\mvc\mailing_list.txt', 'r') as infile:
    outlist = []
    for line in infile.read().split('\n'):
        if line.rstrip().lower().endswith('edu'):
            newline = line + '\n'
            outlist.append(newline.lower())
        elif re.search("@bar", line):
            newline = re.sub("@bar.*", "@baruch.cuny.edu", line)+'\n'
            outlist.append(newline.lower())
        elif re.search("@bcc", line):
            newline = re.sub("@bcc.*", "@bcc.cuny.edu", line)+'\n'
            outlist.append(newline.lower())
        elif re.search("@bmc", line):
            newline = re.sub("@bmc.*", "@bmcc.cuny.edu", line)+'\n'
            outlist.append(newline.lower())
        elif re.search("@leh", line):
            newline = re.sub("@leh.*", "@lehman.cuny.edu", line)+'\n'
            outlist.append(newline.lower())

with file('c:\python27\mvc\output.txt','w') as outfile:
    outfile.writelines(outlist)

Thanks,
Marco
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
First, look here about reading files:
http://docs.python.org/tutorial/inputoutput.html#methods-of-file-objects

I like this better:
     f = open('filename', 'r')
     for line in f:
         print line     # this will give you one line at a time without
the trailing newline

Second, make a dictionary of with the key being what comes after the @
in your truncated file.  The value will be the complete text you want:
  d = {"bcc" : "bcc.cuny.edu", etc. }

Third, use line.split('@') to split the line into what comes before
and after the @ sign.  It will return a list
     address_parts = line.split('@')

address_parts[0] is what you want to keep as is. I'm guessing that the
3 characters after the @ will be enough to identify what the full
address should look like, so
if address_parts[1][0:3] in d:
   result = '@'.join([address_parts[0], d[address_parts[1][0:3]])

write the result to your out file.

Its early in the morning for me, and this is untested, but it might
give you some ideas.

Hi Joel,

Thanks. I like the dictionary idea... I hadn't thought of that because I was trying to fix one "problem" and then realized I had more, and then yet more, so it just kept growing--a case of not seeing the forest for the trees. And, if I split the address at the amphora I wouldn't need to worry about where exactly it was truncated, so no regular expressions to gather up the remaining characters after the key.

Thanks again,
Marco
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to