At Thursday I wrote:
> Since there's so much interest in this, I'll submit a couple of timings as
> soon as possible =)
Here they are:
----------------------------------------------------------
System: P4 2.4Ghz, 1GB RAm, 4GB swap
----------------------------------------------------------
Schema and index of the "genotypes" table:
CREATE TABLE genotypes (markerid integer NOT NULL REFERENCES marker(id),
individualid integer NOT NULL REFERENCES individuals(id),
genA integer,
genB integer)
CREATE UNIQUE INDEX genotypeidx ON genotypes(markerid, individualid)
----------------------------------------------------------
Size of the database?
$> ls -l db
-rw-r--r-- 1 <user> <group> 9618644992 Mar 17 16:41 db
Hours to create a database that size?
1130 minutes, about 19 hours.
----------------------------------------------------------
How many individuals? (expected: 1500)
$> time snpdb-query db "SELECT count (*) FROM individuals"
1500
real 0m0.205s
user 0m0.004s
sys 0m0.004s
----------------------------------------------------------
How many markers? (expected: 200.000)
$> time snpdb-query db "SELECT count (*) FROM marker"
181322
real 0m34.054s
user 0m0.048s
sys 0m0.200s
--> less markers, reason: partition full
----------------------------------------------------------
How many genotypes? (expected: 300.000.000)
$> time snpdb-query db "SELECT count (*) FROM genotypes"
271983000
real 5m40.224s
user 0m52.015s
sys 0m27.010s
--> less genotypes, reason: partition full
----------------------------------------------------------
Get all markers of a single individual
$> time snpdb-query db "SELECT * from genotypes WHERE individualid==1" > dump
real 9m10.964s
user 2m22.085s
sys 0m27.458s
----------------------------------------------------------
Get all all individuals at a single locus
$> time snpdb-query db "SELECT * from genotypes WHERE markerid=12224" > dump
real 0m0.118s
user 0m0.016s
sys 0m0.008s
Get multiple markers for all individuals
$> time snpdb-query db "SELECT * from genotypes WHERE \
markerid IN (1,10,100,1000,10000,100000)" > dump
real 0m0.213s
user 0m0.088s
sys 0m0.008s
Get a subset of individuals for a subset of markers
$> time snpdb-query db "SELECT * from genotypes WHERE \
markerid between 100 AND 200 AND \
individualid between 50 AND 500" > dump
real 0m0.995s
user 0m0.892s
sys 0m0.040s
--> files are inserted per marker, high locality within the db?!
----------------------------------------------------------
These timings above are singleshots and may vary due to system load.
If there is other query/timing of interest, please let me know.
Regards
Daniel