With CArrays you can only have one specific type for the array (int,
float, etc) whereas with a table each column can have a different type
(string, float, etc). If you want to replicate this with carray, you
would have to have multiple carray's for each type.
I think for storing numerical data whe
Alted" :
> On 7/18/12 4:11 PM, Ümit Seren wrote:
> > Actually I had 30.000 groups in a parent group.
> > Each of the 30.000 groups had maybe 3 datasets.
> > So to be honest I never had 30.000 datasets in a single group.
> > I guess you will probably have to disab
On 7/18/12 2:07 PM, Ümit Seren wrote:
>> I actually had 30.000 groups attached to the data group. But I guess
>> it doesn't really matter whether it is a table or a group. They both
>> are nodes.
>
> 30.000 datasets attached to the same group? I'm interested in
linked to a similar node (in this case, data)? I seem to have a problem
> putting that many nodes from one root.
>
> -Jacob
>
>
> On Wed, Jul 18, 2012 at 6:54 AM, Ümit Seren wrote:
>>
>> On Wed, Jul 18, 2012 at 1:32 PM, Jacob Bennett
>> wrote:
>> > S
-table
- CArray
- dataset2
.
.
.
- dataset30.000
> If you could help me out with these two items, I think I will have enough
> knowledge under my belt to know what I need to do. Thanks again! ;)
>
>
> On Wed, Jul
rors that I get are occasional read errors (which isn't much of a
> problem for me), so I am thinking. Could there be a way to reduce the
> metadata within an hdf5 and at the same time, use a multi-tabled approach to
> solve my problem?
>
> Thanks,
> Jacob
>
>
> On Wed,
Just to add what Anthony said:
In the end it also depends how unrelated your data is and how you want
to access it. If the access scenaria is that you usually only search
or select within a specific dataset then splitting up the datasets and
putting them into separate tables is the way to go. In RB
Good points.
Just some additional comments:
I do think that scientific/hierarchical file formats like HDF5 and
RDBMS system have their specific use cases and I don't think it makes
sense to replace one with the other.
I do also think that you shouldn't try to apply RDBMS principles to
HDF5 like fo
I completely forgot about the CSI index. That's of course much easier than
what I suggested ;-)
Am 22.03.2012 17:39 schrieb "Francesc Alted" :
> On 3/22/12 11:02 AM, sreeaurovindh viswanathan wrote:
> > Hi,
> >
> > If I have three columns in a table and if i wish to sort based on one
> > field and
AFAIK there is no sort functionality built into PyTables.
I think there are 4 ways to do it:
1.) load all 7.5 million records and sort it in memory (if it fits
into the memory)
2.) implement your own external sorting algorithm
(http://en.wikipedia.org/wiki/External_sorting) using pytables
iterato
I guess using the slice operator on the table should probably also
load the entire table into memory:
a = f.root.path.to.table[:]
This will return a structured array tough.
On Mon, Feb 20, 2012 at 5:43 PM, Anthony Scopatz wrote:
> Hello German,
>
> The easiest and probably the fastest way is t
Because the profile output is probably not formatted properly in the
mail I attached the two line_profiler profile output files. In
addition to this I also added the profile for _table__whereIndexed()
function.
On Mon, Jan 23, 2012 at 12:13 PM, Ümit Seren wrote:
> Hi Anthony
> I di
ust my suspicion, but it would seem to give this
> behavior. Profiling would at least tell us which function or method is the
> trouble maker.
>
> Do you have a script that reproduces this as a whole?
>
> Be Well
> Anthony
>
> On Sat, Jan 21, 2012 at 7:23 AM, Ümit Sere
I recently used ptrepack to compact my hdf5 file and forgot to active
the options to propagate indexes.
Just out of curiosity I decided to compare performance between the two
tables (one with index and one without) for some queries.
The table structure looks like this:
"gene_mid_pos": UInt32Col
I will add some code for benchmarking as soon as possible.
On Fri, Jan 20, 2012 at 7:36 PM, Antonio Valentino
wrote:
> Hi Francesc, hi Ümit,
>
> Il 20/01/2012 15:16, Francesc Alted ha scritto:
>> 2012/1/20 Ümit Seren
>>
>>> So I played around a little bit fut
a while to 1 table/sec instead of 10 tables/sec
When i change it to NODE_CACHE_SLOTS=0 I don't have any performance problems.
On Thu, Jan 19, 2012 at 7:43 AM, Francesc Alted wrote:
> 2012/1/18 Ümit Seren
>>
>> Hi Francesc,
>> I will try to get some numbers as soon a
wrote:
> 2012/1/17 Anthony Scopatz
>>
>>
>>
>> On Tue, Jan 17, 2012 at 4:35 AM, Ümit Seren wrote:
>>>
>>> @Anthony:
>>> Thanks for the quick reply.
>>> I fixed my problem (I will get to it later) but first to my previous
>>>
rows for a specific result_type and then
append them via table.append()
By doing this the performance doesn't degrade at all. Memory
consumption is also reasonable.
cheers
Ümit
P.S.: Sorry for writing this mail in this way. However I somehow
didn't get your response directly via mail
I created a hdf5 file with pytables which contains around 29 000
tables with around 31k rows each.
I am trying to create a caching table in the same hdf5 file which
contains a subset of those 29 000 tables.
I wrote a script which basically iterates through each of the 29 000
tables retrieves a sub
19 matches
Mail list logo