On 15/02/13 07:55, Michael McConachie wrote:

Essentially:

1.  I have a list of numbers that already exist in a file.  I generate this 
file by parsing info from logs.
2.  Each line contains an integer on it (corresponding to the number of 
milliseconds that it takes to complete a certain repeated task).
3.  There are over a million entries in this file, one per line; at any given 
time it can be just a few thousand, or more than a million.

    Example:
    -------
    173
    1685
    1152
    253
    1623


A million entries sounds like a lot to you or me, but to your computer, it's 
not. When you start talking tens or hundreds of millions, that's possibly a lot.

Do you know how to read those numbers into a Python list? Here is the "baby 
step" way to do so:


data = []  # Start with an empty list.
f = open("filename")  # Obviously you have to use the actual file name.
for line in f:  # Read the file one line at a time.
    num = int(line)  # Convert each line into an integer (whole number)
    data.append(num)  # and append it to the end of the list.
f.close()  # Close the file when done.


Here's a more concise way to do it:

with open("filename") as f:
    data = [int(line) for line in f]



Once you have that list of numbers, you can sum the whole lot:

sum(data)


or just a range of the items:

sum(data[:100])  # The first 100 items.

sum(data[100:200])  # The second 100 items.

sum(data[-50:])  # The last 50 items.

sum(data[1000:])  # Item 1001 to the end.  (See below.)

sum(data[5:99:3])  # Every third item, starting at index 5 and ending at index 
98.



This is called "slicing", and it is perhaps the most powerful and useful 
technique that Python gives you for dealing with lists. The rules though are not 
necessarily the most intuitive though.


A slice is either a pair of numbers separated with a colon, inside the square 
brackets:

    data[start:end]

or a triple:

    data[start:end:step]

Any of these three numbers can be left out. The default values are:

start=0
end=length of the sequence being sliced
step=1

They can also be negative. If start or end are negative, they are interpreted as "from the 
end" rather than "from the beginning".

Item positions are counted from 0, which will be very familiar to C 
programmers. The start index is included in the slice, the end position is 
excluded.

The model that you should think of is to imagine the sequence of items labelled 
with their index, starting from zero, and with a vertical line *between* each 
position. Here is a sequence of 26 items, showing the index in the first line 
and the value in the second:


|0|1|2|3|4|5|6|7|8|9| ... |25|
|a|b|c|d|e|f|g|h|i|j| ... |z |

When you take a slice, the items are always cut at the left. So, if the above is called 
"letters", we have:

letters[0:4]  # returns "abcd"

letters[2:8]  # returns "cdefgh"

letters[2:8:2]  # returns "ceg"

letters[-3:]  # returns "xyz"



Eventually what I'll need to do is:

1.  Index the file and/or count the lines, as to identify each line's 
positional relevance so that it can average any range of numbers that are 
sequential; one to one another.


No need. Python already does that, automatically, when you read the data into a 
list.



2.  Calculate the difference between any given (x) range.  In order to be able to 
ask the program to average every 5, 10, 100, 100, or 10,000 etc. -->  until 
completion.  This includes the need to dealing with stray remainders at the end of 
the file that aren't divisible by that initial requested range.

I don't quite understand you here. First you say "difference", then you say 
"average". Can you show a sample of data, say, 10 values, and the sorts of typical 
calculations you want to perform, with the answers you expect to get?


For example, here's 10 numbers:


103, 104, 105, 109, 111, 112, 115, 120, 123, 128


Here are the running averages of 3 values:

(103+104+105)/3

(104+105+109)/3

(105+109+111)/3

(109+111+112)/3

(111+112+115)/3

(112+115+120)/3

(115+120+123)/3

(120+123+128)/3


Is that what you mean? If so, then Python can deal with this trivially, using slicing. 
With your data stored in list "data", as above, I can say:


for i in range(0, len(data)-3):  # Stop 3 from the end.
    print sum(data[i:i+3])


to print the running sums taking three items at a time.



The rest of your post just confuses me. Until you explain exactly what 
calculations you are trying to perform, I can't tell you how to perform them :-)




--
Steven
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to