On 8/1/2015 4:07 PM, Ltc Hotspot wrote:
Hi Alan,
Question1: The output result is an address or line?
It's a set actually. Ready to be further processed I imagine. Or to
print out line by line if desired.
Question2: Why are there 54 lines as compared to 27 line in the desired
output?
Because there are 54 lines that start with 'From'.
As I noted in looking at your source data, for each email there's a
'From ' and a 'From:' -- you'd get the right answer checking only for
startswith('From ')
Emile
Here is the latest revised code:
fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
addresses = set()
for line in fh:
if line.startswith('From'):
line2 = line.strip()
line3 = line2.split()
line4 = line3[1]
addresses.add(line4)
count = count + 1
print addresses
print "There were", count, "lines in the file with From as the first word"
The output result:
set(['stephen.marqu...@uct.ac.za', 'lo...@media.berkeley.edu', '
zq...@umich.edu', 'rjl...@iupui.edu', 'c...@iupui.edu', 'gsil...@umich.edu',
'wagne...@iupui.edu', 'antra...@caret.cam.ac.uk','
gopal.ramasammyc...@gmail.com', 'david.horw...@uct.ac.za', '
r...@media.berkeley.edu']) ← Mismatch
There were 54 lines in the file with From as the first word
The desired output result:
stephen.marqu...@uct.ac.za
lo...@media.berkeley.edu
zq...@umich.edu
rjl...@iupui.edu
zq...@umich.edu
rjl...@iupui.edu
c...@iupui.edu
c...@iupui.edu
gsil...@umich.edu
gsil...@umich.edu
zq...@umich.edu
gsil...@umich.edu
wagne...@iupui.edu
zq...@umich.edu
antra...@caret.cam.ac.uk
gopal.ramasammyc...@gmail.com
david.horw...@uct.ac.za
david.horw...@uct.ac.za
david.horw...@uct.ac.za
david.horw...@uct.ac.za
stephen.marqu...@uct.ac.za
lo...@media.berkeley.edu
lo...@media.berkeley.edu
r...@media.berkeley.edu
c...@iupui.edu
c...@iupui.edu
c...@iupui.edu
There were 27 lines in the file with From as the first word
Regards,
Hal
On Sat, Aug 1, 2015 at 1:40 PM, Alan Gauld <alan.ga...@btinternet.com>
wrote:
On 01/08/15 19:48, Ltc Hotspot wrote:
There is an indent message in the revised code.
Question: Where should I indent the code line for the loop?
Do you understand the role of indentation in Python?
Everything in the indented block is part of the structure,
so you need to indent everything that should be executed
as part of the logical block.
fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
addresses = set()
for line in fh:
if line.startswith('From'):
line2 = line.strip()
line3 = line2.split()
line4 = line3[1]
addresses.add(line)
count = count + 1
Everything after the if line should be indented an extra level
because you only want to do those things if the line
startswith From.
And note that, as I suspected, you are adding the whole line
to the set when you should only be adding the address.
(ie line4). This would be more obvious if you had
used meaningful variable names such as:
strippedLine = line.strip()
tokens = strippedLine.split()
addr = tokens[1]
addresses.add(addr)
PS.
Could you please delete the extra lines from your messages.
Some people pay by the byte and don't want to receive kilobytes
of stuff they have already seen multiple times.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor