Dave Hayden wrote:
> On Jun 20, 2004, at 9:07 PM, Darren Duncan wrote:
>
>> Generally speaking, you should only use indexes on table columns that have
>> a lot of distinct values, and each one only appears a few times. You should
>> not use indexes on columns that have few distinct values and each appears
>> many times; in the latter case, a full table scan would be faster.
>
>
> That's weird. I would have thought that having any index at all to pare down
> the result set would make things faster..? Wouldn't the select here:
>
> CREATE TABLE tmp ( flag boolean, name text );
>
> SELECT name FROM tmp WHERE flag = 1 AND name LIKE '%foo%';
>
> run faster with an index on the flag column since it can scan just the flag =
> 1 rows instead of the full table?
>

Let N be the total number of entires in the table and let
K be the number of entries with a matching key.  Then to
do a full table scan requires O(N) time.  To use an index,
you first consult the index to get a list of K rowids of
potential matches.  This takes time O(K).  Then for each
rowid you have to look up the corresponding row in the
btree, an O(logN) operation.  So the total time for using
an index is O(KlogN).

If there is an upper bound on the number of duplicate key
entries in a table (in other words if there are lots of
different keys) then K will be a constant and O(KlogN)
simplifies to O(logN) which is clearly less than O(N).
In that case, it pays to use the index.

But suppose there are only a fixed number of different
key values.  As the N grows, the number of duplicate keys
will increase.  K will be a linear function of N.  So
O(KlogN) becomes O(NlogN) which is clearly more than O(N).
In that case, it is best to use a full table scan when
N is large.


-- D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to