Re: Memory efficient tuple storage

2009-03-19 Thread psaff...@googlemail.com
In the end, I used a cStringIO object to store the chromosomes - because there are only 23, I can use one character for each chromosome and represent the whole lot with a giant string and a dictionary to say what each character means. Then I used numpy arrays for the data and coordinates. This

Memory efficient tuple storage

2009-03-13 Thread psaff...@googlemail.com
I'm reading in some rather large files (28 files each of 130MB). Each file is a genome coordinate (chromosome (string) and position (int)) and a data point (float). I want to read these into a list of coordinates (each a tuple of (chromosome, position)) and a list of data points. This has taught

Re: Memory efficient tuple storage

2009-03-13 Thread Kurt Smith
On Fri, Mar 13, 2009 at 10:59 AM, psaff...@googlemail.com psaff...@googlemail.com wrote: I'm reading in some rather large files (28 files each of 130MB). Each file is a genome coordinate (chromosome (string) and position (int)) and a data point (float). I want to read these into a list of

Re: Memory efficient tuple storage

2009-03-13 Thread Tim Wintle
On Fri, 2009-03-13 at 08:59 -0700, psaff...@googlemail.com wrote: I'm reading in some rather large files (28 files each of 130MB). Each file is a genome coordinate (chromosome (string) and position (int)) and a data point (float). I want to read these into a list of coordinates (each a tuple

Re: Memory efficient tuple storage

2009-03-13 Thread Kurt Smith
On Fri, Mar 13, 2009 at 11:33 AM, Kurt Smith kwmsm...@gmail.com wrote: [snip OP] Assuming your data is in a plaintext file something like 'genomedata.txt' below, the following will load it into a numpy array with a customized dtype.  You can access the different fields by name ('chromo',

Re: Memory efficient tuple storage

2009-03-13 Thread Tim Chase
While Kurt gave some excellent ideas for using numpy, there were some missing details in your original post that might help folks come up with a work smarter, not harder solution. Clearly, you're not loading it into memory just for giggles -- surely you're *doing* something with it once it's

Re: Memory efficient tuple storage

2009-03-13 Thread psaff...@googlemail.com
Thanks for all the replies. First of all, can anybody recommend a good way to show memory usage? I tried heapy, but couldn't make much sense of the output and it didn't seem to change too much for different usages. Maybe I was just making the h.heap() call in the wrong place. I also tried

Re: Memory efficient tuple storage

2009-03-13 Thread Benjamin Peterson
psaffrey at googlemail.com psaffrey at googlemail.com writes: First of all, can anybody recommend a good way to show memory usage? Python 2.6 has a function called sys.getsizeof(). -- http://mail.python.org/mailman/listinfo/python-list

Re: Memory efficient tuple storage

2009-03-13 Thread Gabriel Genellina
En Fri, 13 Mar 2009 14:49:51 -0200, Tim Wintle tim.win...@teamrubber.com escribió: If the same chromosome string is being used multiple times then you may find it more efficient to reference the same string, so you don't need to have multiple copies of the same string in memory. That may be

Re: Memory efficient tuple storage

2009-03-13 Thread Kurt Smith
On Fri, Mar 13, 2009 at 1:13 PM, psaff...@googlemail.com psaff...@googlemail.com wrote: Thanks for all the replies. [snip] The numpy solution does work, but it uses more than 1GB of memory for one of my 130MB files. I'm using np.dtype({'names': ['chromo', 'position', 'dpoint'], 'formats':

Re: Memory efficient tuple storage

2009-03-13 Thread Paul Rubin
psaff...@googlemail.com psaff...@googlemail.com writes: However, I still need the coordinates. If I don't keep them in a list, where can I keep them? See the docs for the array module: http://docs.python.org/library/array.html -- http://mail.python.org/mailman/listinfo/python-list

Re: Memory efficient tuple storage

2009-03-13 Thread Aaron Brady
On Mar 13, 1:13 pm, psaff...@googlemail.com psaff...@googlemail.com wrote: Thanks for all the replies. First of all, can anybody recommend a good way to show memory usage? I tried heapy, but couldn't make much sense of the output and it didn't seem to change too much for different usages.