Re: [Pytables-users] Optimizing pytables for reading entire columns at a time

2012-09-24 Thread Ümit Seren
With CArrays you can only have one specific type for the array (int, float, etc) whereas with a table each column can have a different type (string, float, etc). If you want to replicate this with carray, you would have to have multiple carray's for each type. I think for storing numerical data whe

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

2012-07-18 Thread Ümit Seren
Alted" : > On 7/18/12 4:11 PM, Ümit Seren wrote: > > Actually I had 30.000 groups in a parent group. > > Each of the 30.000 groups had maybe 3 datasets. > > So to be honest I never had 30.000 datasets in a single group. > > I guess you will probably have to disab

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

2012-07-18 Thread Ümit Seren
On 7/18/12 2:07 PM, Ümit Seren wrote: >> I actually had 30.000 groups attached to the data group. But I guess >> it doesn't really matter whether it is a table or a group. They both >> are nodes. > > 30.000 datasets attached to the same group? I'm interested in

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

2012-07-18 Thread Ümit Seren
linked to a similar node (in this case, data)? I seem to have a problem > putting that many nodes from one root. > > -Jacob > > > On Wed, Jul 18, 2012 at 6:54 AM, Ümit Seren wrote: >> >> On Wed, Jul 18, 2012 at 1:32 PM, Jacob Bennett >> wrote: >> > S

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

2012-07-18 Thread Ümit Seren
-table - CArray - dataset2 . . . - dataset30.000 > If you could help me out with these two items, I think I will have enough > knowledge under my belt to know what I need to do. Thanks again! ;) > > > On Wed, Jul

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

2012-07-18 Thread Ümit Seren
rors that I get are occasional read errors (which isn't much of a > problem for me), so I am thinking. Could there be a way to reduce the > metadata within an hdf5 and at the same time, use a multi-tabled approach to > solve my problem? > > Thanks, > Jacob > > > On Wed,

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

2012-07-17 Thread Ümit Seren
Just to add what Anthony said: In the end it also depends how unrelated your data is and how you want to access it. If the access scenaria is that you usually only search or select within a specific dataset then splitting up the datasets and putting them into separate tables is the way to go. In RB

Re: [Pytables-users] Main differences between PyTables and Relational

2012-04-26 Thread Ümit Seren
Good points. Just some additional comments: I do think that scientific/hierarchical file formats like HDF5 and RDBMS system have their specific use cases and I don't think it makes sense to replace one with the other. I do also think that you shouldn't try to apply RDBMS principles to HDF5 like fo

Re: [Pytables-users] Help on sorting tables

2012-03-22 Thread Ümit Seren
I completely forgot about the CSI index. That's of course much easier than what I suggested ;-) Am 22.03.2012 17:39 schrieb "Francesc Alted" : > On 3/22/12 11:02 AM, sreeaurovindh viswanathan wrote: > > Hi, > > > > If I have three columns in a table and if i wish to sort based on one > > field and

Re: [Pytables-users] Help on sorting tables

2012-03-22 Thread Ümit Seren
AFAIK there is no sort functionality built into PyTables. I think there are 4 ways to do it: 1.) load all 7.5 million records and sort it in memory (if it fits into the memory) 2.) implement your own external sorting algorithm (http://en.wikipedia.org/wiki/External_sorting) using pytables iterato

Re: [Pytables-users] Question about reading a complete table.

2012-02-20 Thread Ümit Seren
I guess using the slice operator on the table should probably also load the entire table into memory: a = f.root.path.to.table[:] This will return a structured array tough. On Mon, Feb 20, 2012 at 5:43 PM, Anthony Scopatz wrote: > Hello German, > > The easiest and probably the fastest way is t

Re: [Pytables-users] Performance problems in indexed tables.

2012-01-23 Thread Ümit Seren
Because the profile output is probably not formatted properly in the mail I attached the two line_profiler profile output files. In addition to this I also added the profile for _table__whereIndexed() function. On Mon, Jan 23, 2012 at 12:13 PM, Ümit Seren wrote: > Hi Anthony > I di

Re: [Pytables-users] Performance problems in indexed tables.

2012-01-23 Thread Ümit Seren
ust my suspicion, but it would seem to give this > behavior.  Profiling would at least tell us which function or method is the > trouble maker. > > Do you have a script that reproduces this as a whole? > > Be Well > Anthony > > On Sat, Jan 21, 2012 at 7:23 AM, Ümit Sere

[Pytables-users] Performance problems in indexed tables.

2012-01-21 Thread Ümit Seren
I recently used ptrepack to compact my hdf5 file and forgot to active the options to propagate indexes. Just out of curiosity I decided to compare performance between the two tables (one with index and one without) for some queries. The table structure looks like this: "gene_mid_pos": UInt32Col

Re: [Pytables-users] Write performance & iterating through nodes

2012-01-21 Thread Ümit Seren
I will add some code for benchmarking as soon as possible. On Fri, Jan 20, 2012 at 7:36 PM, Antonio Valentino wrote: > Hi Francesc, hi Ümit, > > Il 20/01/2012 15:16, Francesc Alted ha scritto: >> 2012/1/20 Ümit Seren >> >>> So I played around a little bit fut

Re: [Pytables-users] Write performance & iterating through nodes

2012-01-20 Thread Ümit Seren
a while to 1 table/sec instead of 10 tables/sec When i change it to NODE_CACHE_SLOTS=0 I don't have any performance problems. On Thu, Jan 19, 2012 at 7:43 AM, Francesc Alted wrote: > 2012/1/18 Ümit Seren >> >> Hi Francesc, >> I will try to get some numbers as soon a

Re: [Pytables-users] Write performance & iterating through nodes

2012-01-18 Thread Ümit Seren
wrote: > 2012/1/17 Anthony Scopatz >> >> >> >> On Tue, Jan 17, 2012 at 4:35 AM, Ümit Seren wrote: >>> >>> @Anthony: >>> Thanks for the quick reply. >>> I fixed my problem (I will get to it later) but first to my previous >>>

Re: [Pytables-users] Write performance & iterating through nodes

2012-01-17 Thread Ümit Seren
rows for a specific result_type and then append them via table.append() By doing this the performance doesn't degrade at all. Memory consumption is also reasonable. cheers Ümit P.S.: Sorry for writing this mail in this way. However I somehow didn't get your response directly via mail

[Pytables-users] Write performance & iterating through nodes

2012-01-16 Thread Ümit Seren
I created a hdf5 file with pytables which contains around 29 000 tables with around 31k rows each. I am trying to create a caching table in the same hdf5 file which contains a subset of those 29 000 tables. I wrote a script which basically iterates through each of the 29 000 tables retrieves a sub