Re: [Tutor] suggestions for splitting file based on date

Dave Angel Fri, 19 Jul 2013 13:38:45 -0700

On 07/19/2013 04:00 PM, Peter Otten wrote:

Sivaram Neelakantan wrote:

I've got some stock indices data that I plan to plot using matplotlib.
The data is simply date, idx_close_value and my plan is to plot the
last 30 day, 90, 180 day & all time graphs of the indices.

a) I can do the date computations using the python date libs
b) plotting with matplotlib, I can get that done

what is the best way to split the file into the last 30 day recs, 90
day recs when the data is in increasing time order?  My initial
thinking is to first reverse the file, append to various 30/90/180 day
lists for every rec > computed date for the corresponding date
windows.

Is that the way to go or is there a better way?


I'd start with a single list for the complete data, reverse that using the
aptly named method and then create the three smaller lists using slicing.

For example:

stock_data = range(10)
stock_data.reverse()
stock_data

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

stock_data[:3] # the last three days

[9, 8, 7]

On second thought I don't see why you want to reverse the data. If you omit
that step you need to modify the slicing:

stock_data = range(10)
stock_data[-3:] # the last three days

[7, 8, 9]

I see Alan has assumed that the data is already divided into day-sizehunks, so that subscripting those hunks is possible. He also assumedall the data will fit in memory at one time.

But in my envisioning of your description, I pictured a variable numberof records per day, with each record being a variable length stream ofbytes starting with a length field. I pictured needing to handle amonth with either zero entries or one with 3 billion entries. And evenif a month is reasonable, I pictured the file as having 10 years ofspurious data before you get to the 180 day point.

Are you looking for an optimal solution, or just one that works? Whatorder do you want the final data to be in. How is the data organized ondisk? Is each record a fixed size? If so, you can efficiently do abinary search in the file to find the 30, 90, and 180 day points.

Once you determine the offset in the file for those 180, 90, and 30 daypoints, it's a simple matter to just seek to one such spot and processall the records following. Most records need never be read from disk atall.

If the records are not fixed length, you can still do the same thing,but you will need one complete pass through the file to find those same3 offsets.


--
DaveA

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] suggestions for splitting file based on date

Reply via email to