At Thursday I wrote:
> Since there's so much interest in this, I'll submit a couple of timings as
> soon as possible =)

Here they are:

----------------------------------------------------------
System: P4 2.4Ghz, 1GB RAm, 4GB swap

----------------------------------------------------------
Schema and index of the "genotypes" table:

CREATE TABLE genotypes (markerid integer NOT NULL REFERENCES marker(id),
         individualid integer NOT NULL REFERENCES individuals(id),
         genA integer,
         genB integer)

CREATE UNIQUE INDEX genotypeidx ON genotypes(markerid, individualid)

----------------------------------------------------------
Size of the database?
$> ls -l db
-rw-r--r--  1 <user> <group> 9618644992 Mar 17 16:41 db

Hours to create a database that size?
1130 minutes, about 19 hours.

----------------------------------------------------------
How many individuals? (expected: 1500)
$> time snpdb-query db "SELECT count (*) FROM individuals"
1500

real    0m0.205s
user    0m0.004s
sys     0m0.004s

----------------------------------------------------------
How many markers? (expected: 200.000)
$> time snpdb-query db "SELECT count (*) FROM marker"
181322

real    0m34.054s
user    0m0.048s
sys     0m0.200s

 --> less markers, reason: partition full

----------------------------------------------------------
How many genotypes? (expected: 300.000.000)
$> time snpdb-query db "SELECT count (*) FROM genotypes"
271983000

real    5m40.224s
user    0m52.015s
sys     0m27.010s

 --> less genotypes, reason: partition full

----------------------------------------------------------
Get all markers of a single individual 
$> time snpdb-query db "SELECT * from genotypes WHERE individualid==1" > dump

real    9m10.964s
user    2m22.085s
sys     0m27.458s

----------------------------------------------------------
Get all all individuals at a single locus
$> time snpdb-query db "SELECT * from genotypes WHERE markerid=12224" > dump

real    0m0.118s
user    0m0.016s
sys     0m0.008s

Get multiple markers for all individuals
$> time snpdb-query db "SELECT * from genotypes WHERE \
       markerid IN (1,10,100,1000,10000,100000)" > dump

real    0m0.213s
user    0m0.088s
sys     0m0.008s

Get a subset of individuals for a subset of markers
$> time snpdb-query db "SELECT * from genotypes WHERE \
       markerid between 100 AND 200 AND \
       individualid between 50 AND 500" > dump

real    0m0.995s
user    0m0.892s
sys     0m0.040s


  --> files are inserted per marker, high locality within the db?!

----------------------------------------------------------

These timings above are singleshots and may vary due to system load.

If there is other query/timing of interest, please let me know.


Regards
        Daniel

Reply via email to