Re: [sqlite] large table performance

Bogusław Brandys Fri, 03 Mar 2006 00:12:46 -0800

Daniel Franke wrote:

Hi all.
I spent the last days bragging that a single database file as provided bysqlite is a far better approach to store data than -literally- thousands offlat files. Now, I got a small amount of testing data an wow ... I'm stuck.
Area: Bioinformatics. Imagine a matrix of data: genetic marker names(attribute A) in columns and individuals (attribute B) in rows. Since thenumber of features per attribute varies between projects, I decided to createthree tables:
 * Table markers:     the genetic markers (attribute A), e.g. 100.000 rows
 * Table individuals: individual ids (attribute B), e.g. 1.000 rows
 * Table genotypes:   the genetic data
Tables "markers" and "individuals" have 2 and 6 columns respectively, a uniqueprimary key, and the (basically) the name of the feature, "genotypes" holdsforeign keys to "markers"/"individuals" respectively as well as the genotypecolumn(s), see below.
Genotypes are inserted by:
INSERT OR ROLLBACK INTO genotypes VALUES ((SELECT id FROM marker WHEREname='$markername$'), (SELECT id FROM individuals WHERE pedigree='$pedigree$'AND person='$person$'), $genA$, $genB$);
Where $markername$, ..., $genB$ are replaced with the appropiate values.
Given the schema below, feeding a million INSERTs into the database bysqlite3_exec() takes about 30 minutes (this includes transactions, indicesand "PRAGMA synchronous=off" as well as a fully optimized build of sqlite).Is there any chance to speed this up? Production datasets could easily bring abillion genotypes ...
Any pointer would be appreciated!

With kind regards

        Daniel Franke


--

The database schema:


CREATE TABLE marker (
    id integer PRIMARY KEY AUTOINCREMENT,
    name varchar UNIQUE);

CREATE INDEX markernameidx on marker(name);


CREATE TABLE individuals (
    id integer PRIMARY KEY AUTOINCREMENT,
    pedigree varchar NOT NULL,
    person varchar NOT NULL,
    father varchar,
    mother varchar,
    sex integer NOT NULL,
    UNIQUE(pedigree, person));

CREATE INDEX individualidx ON individuals (pedigree, person);


CREATE TABLE genotypes(
    markerid integer NOT NULL REFERENCES marker(id),
    individualid integer NOT NULL REFERENCES individuals(id),
    genA integer,
    genB integer,
    UNIQUE(markerid, individualid));

CREATE INDEX genotypeidx ON genotypes(markerid, individualid);


Test with sqlite3 command line shell with transaction(s)
That would be a better (an easier to reproduce) test.


Regards
Boguslaw

Re: [sqlite] large table performance

Reply via email to