"Igor Tandetnik" <[EMAIL PROTECTED]> writes:
> This is not unusual when you index a field with low cardinality
> (number of distinct values). To illustrate, imagine that allele=0 in
> all rows. A simple scan without using an index would look once at each
> row, for a complexity of O(N) (where N is the number of rows in a
> table). A scan using an index would visit every entry in the index
> (there are N of them), obtain rowid of the corresponding row and then
> look up the field "fid" based on this rowid (a log(N) operation), for
> an overall complexity of O(N log N).

Igor, thanks for that explaination.  Makes sense.

> In this particular case, you can improve performance of this specific
> query if you declare fid as "integer primary key" (and drop UNIQUE
> constraint, as it is implied by primary key). This will make fid a
> synonym for rowid, and the query will be satisfied by scanning the
> index alone, without ever referring to the table.
>
> In general, however, you should avoid indexing low cardinality fields
> at all: it hurts more often than it helps.

Are there any DB tricks you can point me to for dealing with low
cardinality columns?  If I need to access rows as quickly as possible
according to low cardinality column value (e.g. allele = 0/1, strange
= 0/1), would it make more sense to split these into separate tables?

+ seth


-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to