Seth Falcon <[EMAIL PROTECTED]> wrote:
Are there any DB tricks you can point me to for dealing with low
cardinality columns?  If I need to access rows as quickly as possible
according to low cardinality column value (e.g. allele = 0/1, strange
= 0/1), would it make more sense to split these into separate tables?

This depends a lot on the kind of queries you need to run. Is the query you show in the original post typical - that is, do you regularly traverse half of all the rows? If so, I don't see you achieving any dramatic improvements - at best your query could run twice as fast as a naive scan of all rows.

Splitting into two tables may help if you never need to run queries that don't differentiate on "allele" field - queries that should produce records with both allele=0 and allele=1.

Is your data read-only after creation, or do you need to modify it? If it is read-only, you may arrange it so that all records with allele=0 are inserted before all records with allele=1. This relies on SQLite's tendency to enumerate records in the order of insertion in the absence of conditions or clauses dictating otherwise. So you keep track of how many rows with allele=0 you have, then you can scan them with something like this:

select * from pmfeature limit :N;  -- allele=0
select * from pmfeature offset :N; -- allele=1

This is rather fragile, any modification may break the arrangement. Exercise caution.

Igor Tandetnik

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to