Re: Reading a large csv file

2009-07-15 Thread drozzy
On Jun 26, 6:47 am, Mag Gam wrote: > Thankyou everyone for the responses! I took some of your suggestions > and my loading sped up by 25% what a useless post... -- http://mail.python.org/mailman/listinfo/python-list

Re: Reading a large csv file

2009-06-26 Thread Mag Gam
Thankyou everyone for the responses! I took some of your suggestions and my loading sped up by 25% On Wed, Jun 24, 2009 at 3:57 PM, Lie Ryan wrote: > Mag Gam wrote: >> Sorry for the delayed response. I was trying to figure this problem >> out. The OS is Linux, BTW > > Maybe I'm just being pedant

Re: Reading a large csv file

2009-06-24 Thread Lie Ryan
Mag Gam wrote: > Sorry for the delayed response. I was trying to figure this problem > out. The OS is Linux, BTW Maybe I'm just being pedantic, but saying your OS is Linux means little as there are hundreds of variants (distros) of Linux. (Not to mention that Linux is a kernel, not a full blown OS

Re: Reading a large csv file

2009-06-24 Thread skip
Mag> s=0 Mag> #Takes the longest here Mag> for y in fs: Mag> continue Mag> a=y.split(',') Mag> s=s+1 Mag> dset.resize(s,axis=0) Mag> fs.close() Mag> f.close() Mag> This works but just takes a VERY long time. Mag> Any way to optimize this?

Re: Reading a large csv file

2009-06-24 Thread Mag Gam
Sorry for the delayed response. I was trying to figure this problem out. The OS is Linux, BTW Here is some code I have: import numpy as np from numpy import * import gzip import h5py import re import sys, string, time, getopt import os src=sys.argv[1] fs = gzip.open(src) x=src.split("/") filena

Re: Reading a large csv file

2009-06-24 Thread Chris Withers
Terry Reedy wrote: Mag Gam wrote: Yes, the system has 64Gig of physical memory. drool ;-). Well, except that, dependent on what OS he's using, the size of one process may well still be limited to 2GB... Chris -- Simplistix - Content Management, Zope & Python Consulting - http:

Re: Reading a large csv file

2009-06-23 Thread python
Mag, If your source data is clean, it may also be faster for you to parse your input files directly vs. use the CSV module which may(?) add some overhead. Check out the struct module and/or use the split() method of strings. We do a lot of ETL processing with flat files and on a slow single core

Re: Reading a large csv file

2009-06-23 Thread Terry Reedy
Mag Gam wrote: Yes, the system has 64Gig of physical memory. drool ;-). What I meant was, is it possible to load to a hdf5 dataformat (basically NumPy array) without reading the entire file at first? I would like to splay to disk beforehand so it would be a bit faster instead of having 2 copi

Re: Reading a large csv file

2009-06-22 Thread Peter Otten
Mag Gam wrote: > Yes, the system has 64Gig of physical memory. > > > What I meant was, is it possible to load to a hdf5 dataformat > (basically NumPy array) without reading the entire file at first? I > would like to splay to disk beforehand so it would be a bit faster > instead of having 2 copi

Re: Reading a large csv file

2009-06-22 Thread Mag Gam
Yes, the system has 64Gig of physical memory. What I meant was, is it possible to load to a hdf5 dataformat (basically NumPy array) without reading the entire file at first? I would like to splay to disk beforehand so it would be a bit faster instead of having 2 copies in memory. On Tue, Jun

Re: Reading a large csv file

2009-06-22 Thread Horace Blegg
Do you even HAVE 14 gigs of memory? I can imagine that if the OS needs to start writing to the page file, things are going to slow down. -- http://mail.python.org/mailman/listinfo/python-list

Re: Reading a large csv file

2009-06-22 Thread Steven D'Aprano
On Mon, 22 Jun 2009 23:17:22 -0400, Mag Gam wrote: > Hello All, > > I have a very large csv file 14G and I am planning to move all of my > data to hdf5. [...] > I was wondering if anyone knows of any techniques to load this file > faster? Faster than what? What are you using to load the file?

Reading a large csv file

2009-06-22 Thread Mag Gam
Hello All, I have a very large csv file 14G and I am planning to move all of my data to hdf5. I am using h5py to load the data. The biggest problem I am having is, I am putting the entire file into memory and then creating a dataset from it. This is very inefficient and it takes over 4 hours to cr