In the end, I used a cStringIO object to store the chromosomes -
because there are only 23, I can use one character for each chromosome
and represent the whole lot with a giant string and a dictionary to
say what each character means. Then I used numpy arrays for the data
and coordinates. This
I'm reading in some rather large files (28 files each of 130MB). Each
file is a genome coordinate (chromosome (string) and position (int))
and a data point (float). I want to read these into a list of
coordinates (each a tuple of (chromosome, position)) and a list of
data points.
This has taught
On Fri, Mar 13, 2009 at 10:59 AM, psaff...@googlemail.com
psaff...@googlemail.com wrote:
I'm reading in some rather large files (28 files each of 130MB). Each
file is a genome coordinate (chromosome (string) and position (int))
and a data point (float). I want to read these into a list of
On Fri, 2009-03-13 at 08:59 -0700, psaff...@googlemail.com wrote:
I'm reading in some rather large files (28 files each of 130MB). Each
file is a genome coordinate (chromosome (string) and position (int))
and a data point (float). I want to read these into a list of
coordinates (each a tuple
On Fri, Mar 13, 2009 at 11:33 AM, Kurt Smith kwmsm...@gmail.com wrote:
[snip OP]
Assuming your data is in a plaintext file something like
'genomedata.txt' below, the following will load it into a numpy array
with a customized dtype. You can access the different fields by name
('chromo',
While Kurt gave some excellent ideas for using numpy, there were
some missing details in your original post that might help folks
come up with a work smarter, not harder solution.
Clearly, you're not loading it into memory just for giggles --
surely you're *doing* something with it once it's
Thanks for all the replies.
First of all, can anybody recommend a good way to show memory usage? I
tried heapy, but couldn't make much sense of the output and it didn't
seem to change too much for different usages. Maybe I was just making
the h.heap() call in the wrong place. I also tried
psaffrey at googlemail.com psaffrey at googlemail.com writes:
First of all, can anybody recommend a good way to show memory usage?
Python 2.6 has a function called sys.getsizeof().
--
http://mail.python.org/mailman/listinfo/python-list
En Fri, 13 Mar 2009 14:49:51 -0200, Tim Wintle tim.win...@teamrubber.com
escribió:
If the same chromosome string is being used multiple times then you may
find it more efficient to reference the same string, so you don't need
to have multiple copies of the same string in memory. That may be
On Fri, Mar 13, 2009 at 1:13 PM, psaff...@googlemail.com
psaff...@googlemail.com wrote:
Thanks for all the replies.
[snip]
The numpy solution does work, but it uses more than 1GB of memory for
one of my 130MB files. I'm using
np.dtype({'names': ['chromo', 'position', 'dpoint'], 'formats':
psaff...@googlemail.com psaff...@googlemail.com writes:
However, I still need the coordinates. If I don't keep them in a list,
where can I keep them?
See the docs for the array module:
http://docs.python.org/library/array.html
--
http://mail.python.org/mailman/listinfo/python-list
On Mar 13, 1:13 pm, psaff...@googlemail.com
psaff...@googlemail.com wrote:
Thanks for all the replies.
First of all, can anybody recommend a good way to show memory usage? I
tried heapy, but couldn't make much sense of the output and it didn't
seem to change too much for different usages.
12 matches
Mail list logo