Dave, Alan - thanks for replying. We have a box with 16GB RAM so RAM should not be an issue hopefully.
The datastore is Cassandra and I'm hoping to use the pycassa library for interaction. I do have an additional question related to Cassandra & Python. As part of data processing, I need to fetch slices of data from Cassandra and run computations like sum and percentile calculation on it. The sum along with other attributes needs to be stored back in another Cassandra table that will be queried by end users of a reporting system. This is because Cassandra does not provide any aggregation functions, so we will precompute the aggregations and store in cassandra. So for calculating the sum & percentile in Python, some of the data slices on Cassandra could fetch a lot of rows (e.g.750,000 to 1mill rows) … And since I need to compute a sum and percentile, I need to consider all the rows. I am planning to do this in Python. Do you foresee any issues with this approach? Any advise on this will be greatly appreciated. Thanks a ton! On Tue, Oct 8, 2013 at 2:28 PM, Dave Angel <da...@davea.name> wrote: > On 8/10/2013 16:46, Leena Gupta wrote: > > > Hello, > > > > Looking for some inputs on Python's csv processing feature. > > > > I need to process a large csv file every 5-10 minutes. The file could > > contain 3mill to 10 mill rows and size could be 6MB to 10MB(+). As part > of > > the processing, I need to sum up a number value by grouping on certain > > attributes and store the output in a datastore. I wanted to know if > Python > > is recommended and can it be used for processing data in csv files of > this > > size? Any issues that we need to be aware of? I believe Python has a csv > > library as well. > > > > Thanks! > > > > > > <div dir="ltr">Hello,<br><br>Looking for some inputs on Python's csv > processing feature.<br><br>I need to process a large csv file every 5-10 > minutes. The file could contain 3mill to 10 mill rows and size could be 6MB > to 10MB(+). As part of the processing, I need to sum up a number value by > grouping on certain attributes and store the output in a datastore. I > wanted to know if Python is recommended and can it be used for processing > data in csv files of this size? Any issues that we need to be aware of? I > believe Python has a csv library as well.<br> > > <br>Thanks!<br></div> > > > > Please use text messages here, not html. It not only wastes space, but > frequently messes up formatting. > > Python's csv logic should have no problem dealing with a file of 10 > million rows. As long as you're not trying to keep all 10 million of > them in some internal data structure, the csv logic will deal you a row > at a time, in a most incremental fashion. > > Just make sure the particular datastore you require is supported in > Python. > > > -- > DaveA > > > > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor >
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor