> > This may take a while, about 20 hours maybe. The partition has approx > > 10GB, I can't afford more. Let's hope that this is sufficient. > > 20 hours seems rather long. Even if you have to worry about uniqueness > constraints, there are ways to deal with that that should be much faster > (deal with the data in chunks, load into temp tables, check for dupes, > etc).
It is longer than necessary, that's true. I implemented it the way I did due to internal loggin purposes. I'm not yet convinced that those logging capabilities are worth the delay. I'm still in testing mode ;) > I can tell you that even 750M rows wouldn't be a huge deal for PostgreSQL, > and 20G of data is nothing. Though your table would take somewhere > around 30G due to the higher per-row overhead in PostgreSQL; I'm not > really sure how large the indexes would be. AFAIK, PostgreSQL is implemented in a client-server architecture. For maintainability, I try to avoid such a thing. > As for performance, I haven't seen a single mention of any kind of > metrics you'd like to hit, so it's impossible to guess as to whether > SQLite, PostgreSQL, or anything else would suffice. I posted a couple of timings a few days ago. As far as I can tell, the performance of sqlite will suffice for my tasks, even if run on usual pc hardware =) > As for partitioning, you might still have a win if you can identify some > common groupings, and partition based on that. Even if you can't, you > could at least get a win on single-person queries. The data could easily be grouped by chromosome, but I would like to avoid this, too. I expect, it'd be sort of an hassle to do multi-chromosome queries. Thanks for your input, nevertheless! Regards Daniel