Hey all

I want to use sqlite as a storage engine for DNA sequence data. It is 
part of a workflow engine for DNA processing. I was wondering whether i 
chose the right db design.

The db holds information about DNA sequences. Arranged in three tables.
For every dna sequence one entry in all 3 tables will be made (1:1 
Relation). Data will be inserted once, never deleted, but some 
attributes get updated.

First a static description table which holds information about a DNA
sequence. The unique key, the source etc. Once inserted the data is 
never modified.

CREATE TABLE seq_desc_static(
         seq_id  INTEGER PRIMARY KEY
         file TEXT UNIQUE,
        sample TEXT,
        source TEXT,
        type INTEGER
);


Second a static data table which holds the base sequence for every row 
from seq_desc_static, approx. 30 to 800 bases, and for each base two 
numerical values, a quality value (two digit) and a integer. This data 
will never be modified as well, but will be inserted after the 
seq_desc_static table is populated. This table gets huge.
The table seq_data is just a data storage, query only be seq_id

CREATE TABLE seq_data(
        seq_id INTEGER PRIMARY KEY,
         base TEXT,
        qual TEXT,
        call TEXT
)';

During the processing of a dna sequence we will generate some additional
infos for every row of seq_desc_static. These informations are kept in 
the third table. This table will be updated often during processing.

CREATE TABLE seq_desc_var(
         seq_id INTEGER PRIMARY KEY,
        clear_range TEXT,
        classification INTEGER,
        history TEXT
)

Most of the queries select rows from seq_desc_static and then join with 
the seq_data table to retrieve the data. My impression was that it is 
faster to put the big data in an extra table and to join. Am I right ?

Is it a good solution to create an extra table for the updated data 
'seq_desc_var' or should i put it in seq_desc_static table ?
Selecting all data includes two joins ! That is expensive.


Thanks for any comments




























_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to