Hi Emile, I made a mistake and incorrectly assumed that differences between 54 lines of output and 27 lines of output is the result of removing duplicate email addresses, i.e., [email protected] [email protected], [email protected], [email protected]
Apparently, this is not the case and I was wrong :( The solution to the problem is in the desired line output: [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] There were 27 lines in the file with From as the first word Not in the output of a subset. Latest output: set(['[email protected]', '[email protected]', ' [email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', ' [email protected]', '[email protected]', ' [email protected]']) ← Mismatch There were 54 lines in the file with From as the first word Latest revised code: fname = raw_input("Enter file name: ") if len(fname) < 1 : fname = "mbox-short.txt" fh = open(fname) count = 0 addresses = set() for line in fh: if line.startswith('From'): line2 = line.strip() line3 = line2.split() line4 = line3[1] addresses.add(line4) count = count + 1 print addresses print "There were", count, "lines in the file with From as the first word" Regards, Hal On Sat, Aug 1, 2015 at 5:45 PM, Emile van Sebille <[email protected]> wrote: > On 8/1/2015 4:07 PM, Ltc Hotspot wrote: > >> Hi Alan, >> >> Question1: The output result is an address or line? >> > > It's a set actually. Ready to be further processed I imagine. Or to > print out line by line if desired. > > Question2: Why are there 54 lines as compared to 27 line in the desired >> output? >> > > Because there are 54 lines that start with 'From'. > > As I noted in looking at your source data, for each email there's a 'From > ' and a 'From:' -- you'd get the right answer checking only for > startswith('From ') > > Emile > > > > >> Here is the latest revised code: >> fname = raw_input("Enter file name: ") >> if len(fname) < 1 : fname = "mbox-short.txt" >> fh = open(fname) >> count = 0 >> addresses = set() >> for line in fh: >> if line.startswith('From'): >> line2 = line.strip() >> line3 = line2.split() >> line4 = line3[1] >> addresses.add(line4) >> count = count + 1 >> print addresses >> print "There were", count, "lines in the file with From as the first word" >> >> The output result: >> set(['[email protected]', '[email protected]', ' >> [email protected]', '[email protected]', '[email protected]', ' >> [email protected]', >> '[email protected]', '[email protected]',' >> [email protected]', '[email protected]', ' >> [email protected]']) ← Mismatch >> There were 54 lines in the file with From as the first word >> >> >> The desired output result: >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> [email protected] >> There were 27 lines in the file with From as the first word >> >> Regards, >> Hal >> >> >> >> >> >> >> >> >> >> On Sat, Aug 1, 2015 at 1:40 PM, Alan Gauld <[email protected]> >> wrote: >> >> On 01/08/15 19:48, Ltc Hotspot wrote: >>> >>> There is an indent message in the revised code. >>>> Question: Where should I indent the code line for the loop? >>>> >>>> >>> Do you understand the role of indentation in Python? >>> Everything in the indented block is part of the structure, >>> so you need to indent everything that should be executed >>> as part of the logical block. >>> >>> fname = raw_input("Enter file name: ") >>> >>>> if len(fname) < 1 : fname = "mbox-short.txt" >>>> fh = open(fname) >>>> count = 0 >>>> addresses = set() >>>> for line in fh: >>>> if line.startswith('From'): >>>> line2 = line.strip() >>>> line3 = line2.split() >>>> line4 = line3[1] >>>> addresses.add(line) >>>> count = count + 1 >>>> >>>> >>> Everything after the if line should be indented an extra level >>> because you only want to do those things if the line >>> startswith From. >>> >>> And note that, as I suspected, you are adding the whole line >>> to the set when you should only be adding the address. >>> (ie line4). This would be more obvious if you had >>> used meaningful variable names such as: >>> >>> strippedLine = line.strip() >>> tokens = strippedLine.split() >>> addr = tokens[1] >>> addresses.add(addr) >>> >>> PS. >>> Could you please delete the extra lines from your messages. >>> Some people pay by the byte and don't want to receive kilobytes >>> of stuff they have already seen multiple times. >>> >>> >>> -- >>> Alan G >>> Author of the Learn to Program web site >>> http://www.alan-g.me.uk/ >>> http://www.amazon.com/author/alan_gauld >>> Follow my photo-blog on Flickr at: >>> http://www.flickr.com/photos/alangauldphotos >>> >>> >>> _______________________________________________ >> Tutor maillist - [email protected] >> To unsubscribe or change subscription options: >> https://mail.python.org/mailman/listinfo/tutor >> >> > > _______________________________________________ > Tutor maillist - [email protected] > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > _______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
