"Igor Tandetnik" <[EMAIL PROTECTED]> writes: > This is not unusual when you index a field with low cardinality > (number of distinct values). To illustrate, imagine that allele=0 in > all rows. A simple scan without using an index would look once at each > row, for a complexity of O(N) (where N is the number of rows in a > table). A scan using an index would visit every entry in the index > (there are N of them), obtain rowid of the corresponding row and then > look up the field "fid" based on this rowid (a log(N) operation), for > an overall complexity of O(N log N).
Igor, thanks for that explaination. Makes sense. > In this particular case, you can improve performance of this specific > query if you declare fid as "integer primary key" (and drop UNIQUE > constraint, as it is implied by primary key). This will make fid a > synonym for rowid, and the query will be satisfied by scanning the > index alone, without ever referring to the table. > > In general, however, you should avoid indexing low cardinality fields > at all: it hurts more often than it helps. Are there any DB tricks you can point me to for dealing with low cardinality columns? If I need to access rows as quickly as possible according to low cardinality column value (e.g. allele = 0/1, strange = 0/1), would it make more sense to split these into separate tables? + seth ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------