Re: [sqlite] EXT : Speed up count(distinct col)

Eric Smith Mon, 07 Feb 2011 12:25:45 -0800

Yuzem wrote:

> > 
> > Test#1
> > create index tags_index on tags(tag);
> > You should have an index for any fields you query on like this.
> > 
> 
> Thanks Michael but I don't see any speed improvement:
> create index test on tags(tag);
> select count(distinct tag) from tags;
> 
> This is much faster:
> select count(*) from tags;
> 
> Am I doing something wrong?


As Michael mentioned, you might be getting killed by string comparisons.

I'm no SQL expert, so gurus are welcome to add to the following without
insulting me.

To beef up Michael's suggestion, try something like this:

CREATE TABLE movies(movie_id INTEGER, title TEXT, unique(movie_id));

(BTW, you should consider making movie_id "INTEGER PRIMARY KEY" and then
remove "unique(movie_id)" -- as long as that otherwise makes sense for 
your architecture.  For SQLite-specific reasons that will probably be
faster.)

CREATE TABLE tagNames(tagId INTEGER PRIMARY KEY, tagName TEXT UNIQUE);
CREATE INDEX tagNamesIdxtagName on tagNames(tagName);
CREATE TABLE tags(movie_id INTEGER,
                  tagId INTEGER REFERENCES tagNames,
                  unique(movie_id,tag));

CREATE INDEX tagsIdxTagId on tags(tagId);

Then see how fast it is to ask 
SELECT COUNT(*) FROM tagString;

Note: The index tagNamesIdxtagName is there because you'll probably want it 
to speed up insertions into 'tags'.

Note: If foreign key checking is on, be careful about deletes on table
"tagNames".  Without an index on tags(tagId) a delete of a single row on
tagNames implies a full table scan on tags (to make sure there are no
rows referencing it).  That bit me in the past.

Eric

--
Eric A. Smith

The concept is interesting and well-formed, but in order to earn 
better than a 'C,' the idea must be feasible.
    -- A Yale University management professor in response to Fred Smith's paper
       proposing reliable overnight delivery service.
       (Smith went on to found Federal Express Corp.)
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] EXT : Speed up count(distinct col)

Reply via email to