On 07/19/2013 04:00 PM, Peter Otten wrote:
Sivaram Neelakantan wrote:

I've got some stock indices data that I plan to plot using matplotlib.
The data is simply date, idx_close_value and my plan is to plot the
last 30 day, 90, 180 day & all time graphs of the indices.

a) I can do the date computations using the python date libs
b) plotting with matplotlib, I can get that done

what is the best way to split the file into the last 30 day recs, 90
day recs when the data is in increasing time order?  My initial
thinking is to first reverse the file, append to various 30/90/180 day
lists for every rec > computed date for the corresponding date
windows.

Is that the way to go or is there a better way?

I'd start with a single list for the complete data, reverse that using the
aptly named method and then create the three smaller lists using slicing.

For example:

stock_data = range(10)
stock_data.reverse()
stock_data
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
stock_data[:3] # the last three days
[9, 8, 7]

On second thought I don't see why you want to reverse the data. If you omit
that step you need to modify the slicing:

stock_data = range(10)
stock_data[-3:] # the last three days
[7, 8, 9]



I see Alan has assumed that the data is already divided into day-size hunks, so that subscripting those hunks is possible. He also assumed all the data will fit in memory at one time.

But in my envisioning of your description, I pictured a variable number of records per day, with each record being a variable length stream of bytes starting with a length field. I pictured needing to handle a month with either zero entries or one with 3 billion entries. And even if a month is reasonable, I pictured the file as having 10 years of spurious data before you get to the 180 day point.

Are you looking for an optimal solution, or just one that works? What order do you want the final data to be in. How is the data organized on disk? Is each record a fixed size? If so, you can efficiently do a binary search in the file to find the 30, 90, and 180 day points.

Once you determine the offset in the file for those 180, 90, and 30 day points, it's a simple matter to just seek to one such spot and process all the records following. Most records need never be read from disk at all.

If the records are not fixed length, you can still do the same thing, but you will need one complete pass through the file to find those same 3 offsets.

--
DaveA

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to