> > This may take a while, about 20 hours maybe. The partition has approx
> > 10GB, I can't afford more. Let's hope that this is sufficient.
>
> 20 hours seems rather long. Even if you have to worry about uniqueness
> constraints, there are ways to deal with that that should be much faster
> (deal with the data in chunks, load into temp tables, check for dupes,
> etc).

It is longer than necessary, that's true. I implemented it the way I did due 
to internal loggin purposes. I'm not yet convinced that those logging 
capabilities are worth the delay. I'm still in testing mode ;)


> I can tell you that even 750M rows wouldn't be a huge deal for PostgreSQL,
> and 20G of data is nothing. Though your table would take somewhere
> around 30G due to the higher per-row overhead in PostgreSQL; I'm not
> really sure how large the indexes would be.

AFAIK, PostgreSQL is implemented in a client-server architecture. 
For maintainability, I try to avoid such a thing.


> As for performance, I haven't seen a single mention of any kind of
> metrics you'd like to hit, so it's impossible to guess as to whether
> SQLite, PostgreSQL, or anything else would suffice. 

I posted a couple of timings a few days ago. As far as I can tell, the 
performance of sqlite will suffice for my tasks, even if run on usual pc 
hardware =)


> As for partitioning, you might still have a win if you can identify some
> common groupings, and partition based on that. Even if you can't, you
> could at least get a win on single-person queries.

The data could easily be grouped by chromosome, but I would like to avoid 
this, too. I expect, it'd be sort of an hassle to do multi-chromosome 
queries.


Thanks for your input, nevertheless!

Regards
        Daniel

Reply via email to