On 16-Jun-2013, at 09:21, Mark Lawrence <breamore...@yahoo.co.uk> wrote:

> On 16/06/2013 16:55, Chris “Kwpolska” Warrick wrote:
>> On Sat, Jun 15, 2013 at 7:22 AM, Patrick Williams <pdw0...@gmail.com> wrote:
>>> Hi so I am making a bit of code to extract a bit of numbers data from a file
>>> and then find the average of that data, however while I can get the code to
>>> extract each specific piece of data I need, I can't seem to get the numbers
>>> to add separately  so I can get a proper average. My sum1 variable seems to
>>> only take the last bit of data entered. I was just wondering if anyone knows
>>> what I'm doing wrong, the course I'm following hadn't started using regex
>>> (or even proper lists) at this point, so there must be a way to do it
>>> without. here's the code. the average of the data should be 0.6789 or
>>> something, but I get 0.0334343 or something.
>>> 
>>> count=0
>>> lst=list()
>> 
>> `lst = []` is the preferred syntax.
>> 
>>> fname='mbox-short.txt'
>>> fhand=open(fname)
>>> for line in fhand:
>>>     if line.startswith('X-DSPAM-Confidence:'):
>>>         count=count+1
>>>         colpos=line.find(':')
>>>         zpos=line.find('0',colpos)
>>>         num=float(line[zpos:50])
>>>         sum1=0+num
>>>         avg=float(sum1)/int(count)
> 
> I'll assume unless someone tells me differently that sum1 does not need 
> reinitialising every time, and that avg needs to be calculated when the loop 
> has finished.
> 
>>> print 'Count-', count,'--', 'Average-', avg
>>> 
>>> Any help at all is appreciated, and thanks in advance.
>>> 
>> 
>> I don’t know what file you used, but the message you sent got this
>> header from Gmail, and the format doesn’t seem to be much different:
>> 
>>> X-Spam-Evidence: '*H*': 0.79; '*S*': 0.00; 'separately': 0.09;
>>>        'wrong,': 0.09; 'subject:question': 0.10; 'code.': 0.18;
>>>        'variable': 0.18; 'bit': 0.19; 'advance.': 0.19; 'seems': 0.21;
>>>        '8bit%:5': 0.22; 'print': 0.22; 'skip:l 30': 0.24; '\xa0so': 0.24;
>>> [snip 11 more lines]
>> (replaced tabstops with spaces)
>> 
>> Can you guess what’s wrong in your code?
>> 
>> You are reading only the first line.                 >
> 
> What does "for line in fhand:" do then?

I think what that was referring to was the assumption that you're reading mail 
header lines from that file, and they can be split out over multiple lines (see 
the example cited above).  If that's the case, then "for line in fhand" will 
iterate over each line in the file, but you're only looking for lines which 
start with "X-Spam-.." which would only be the FIRST part of the header if it's 
split out like that.

If your file is NOT organized like that, then your situation is different.  
However, if your files are like that, you're going to randomly miss data if the 
fields you're looking for don't happen to be on the first line of the 
multi-line header.

Now if you are reading RFC-822 (et al) standard mail messages in those files, 
there are bits of the Python standard library which will be immensely useful to 
you in parsing out those headers rather than trying to do it yourself.  That's 
something you're going to find to be the case frequently with Python.
 
> 
> -- 
> "Steve is going for the pink ball - and for those of you who are watching in 
> black and white, the pink is next to the green." Snooker commentator 
> 'Whispering' Ted Lowe.
> 
> Mark Lawrence
> 
> _______________________________________________
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to