On Wed, Jan 11, 2012 at 7:34 AM, Marco Casazza <marco.vince...@gmail.com> wrote: > Hello, > > I've been slowly teaching myself python, using it for small projects when it > seems appropriate. In this case, I was handed a list of email addresses for > a mailing but some of them had been truncated. There are only 21 possible > email "suffixes" so I planned to just identify which it should be and then > replace it. However, when I started writing the code I realized that I'd be > doing a lot of "repeating". Is there a better way to "fix" the suffixes > without doing each individually? Here's my working code (for 4 colleges): > > import re > with file('c:\python27\mvc\mailing_list.txt', 'r') as infile: > outlist = [] > for line in infile.read().split('\n'): > if line.rstrip().lower().endswith('edu'): > newline = line + '\n' > outlist.append(newline.lower()) > elif re.search("@bar", line): > newline = re.sub("@bar.*", "@baruch.cuny.edu", line)+'\n' > outlist.append(newline.lower()) > elif re.search("@bcc", line): > newline = re.sub("@bcc.*", "@bcc.cuny.edu", line)+'\n' > outlist.append(newline.lower()) > elif re.search("@bmc", line): > newline = re.sub("@bmc.*", "@bmcc.cuny.edu", line)+'\n' > outlist.append(newline.lower()) > elif re.search("@leh", line): > newline = re.sub("@leh.*", "@lehman.cuny.edu", line)+'\n' > outlist.append(newline.lower()) > > with file('c:\python27\mvc\output.txt','w') as outfile: > outfile.writelines(outlist) > > Thanks, > Marco > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor
First, look here about reading files: http://docs.python.org/tutorial/inputoutput.html#methods-of-file-objects I like this better: f = open('filename', 'r') for line in f: print line # this will give you one line at a time without the trailing newline Second, make a dictionary of with the key being what comes after the @ in your truncated file. The value will be the complete text you want: d = {"bcc" : "bcc.cuny.edu", etc. } Third, use line.split('@') to split the line into what comes before and after the @ sign. It will return a list address_parts = line.split('@') address_parts[0] is what you want to keep as is. I'm guessing that the 3 characters after the @ will be enough to identify what the full address should look like, so if address_parts[1][0:3] in d: result = '@'.join([address_parts[0], d[address_parts[1][0:3]]) write the result to your out file. Its early in the morning for me, and this is untested, but it might give you some ideas. -- Joel Goldstick _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor