At Thursday I wrote: > Since there's so much interest in this, I'll submit a couple of timings as > soon as possible =)
Here they are: ---------------------------------------------------------- System: P4 2.4Ghz, 1GB RAm, 4GB swap ---------------------------------------------------------- Schema and index of the "genotypes" table: CREATE TABLE genotypes (markerid integer NOT NULL REFERENCES marker(id), individualid integer NOT NULL REFERENCES individuals(id), genA integer, genB integer) CREATE UNIQUE INDEX genotypeidx ON genotypes(markerid, individualid) ---------------------------------------------------------- Size of the database? $> ls -l db -rw-r--r-- 1 <user> <group> 9618644992 Mar 17 16:41 db Hours to create a database that size? 1130 minutes, about 19 hours. ---------------------------------------------------------- How many individuals? (expected: 1500) $> time snpdb-query db "SELECT count (*) FROM individuals" 1500 real 0m0.205s user 0m0.004s sys 0m0.004s ---------------------------------------------------------- How many markers? (expected: 200.000) $> time snpdb-query db "SELECT count (*) FROM marker" 181322 real 0m34.054s user 0m0.048s sys 0m0.200s --> less markers, reason: partition full ---------------------------------------------------------- How many genotypes? (expected: 300.000.000) $> time snpdb-query db "SELECT count (*) FROM genotypes" 271983000 real 5m40.224s user 0m52.015s sys 0m27.010s --> less genotypes, reason: partition full ---------------------------------------------------------- Get all markers of a single individual $> time snpdb-query db "SELECT * from genotypes WHERE individualid==1" > dump real 9m10.964s user 2m22.085s sys 0m27.458s ---------------------------------------------------------- Get all all individuals at a single locus $> time snpdb-query db "SELECT * from genotypes WHERE markerid=12224" > dump real 0m0.118s user 0m0.016s sys 0m0.008s Get multiple markers for all individuals $> time snpdb-query db "SELECT * from genotypes WHERE \ markerid IN (1,10,100,1000,10000,100000)" > dump real 0m0.213s user 0m0.088s sys 0m0.008s Get a subset of individuals for a subset of markers $> time snpdb-query db "SELECT * from genotypes WHERE \ markerid between 100 AND 200 AND \ individualid between 50 AND 500" > dump real 0m0.995s user 0m0.892s sys 0m0.040s --> files are inserted per marker, high locality within the db?! ---------------------------------------------------------- These timings above are singleshots and may vary due to system load. If there is other query/timing of interest, please let me know. Regards Daniel