Mabye one of our experts can explain why distinct takes to long. Seems to me if you have an index you're just returning the values in the index, aren't you? But it looks like it's comparing all keys to all keys.
Do you just want a count of distinct values? select count(distinct(tag)) from tags; I think that will run faster. Then again...if you normalize your tags you end up with just count(*) from the new tag table to find all unique keys sqlite> create index tags_index on tags(tag); sqlite> explain select distinct(tag) from tags; addr opcode p1 p2 p3 p4 p5 comment ---- ------------- ---- ---- ---- ------------- -- ------------- 0 Trace 0 0 0 00 1 Noop 0 0 0 00 2 Integer 0 3 0 00 3 Integer 0 2 0 00 4 Gosub 5 28 0 00 5 Goto 0 31 0 00 6 OpenRead 2 4 0 keyinfo(1,BINARY) 00 7 Rewind 2 18 8 0 00 8 Column 2 0 7 00 9 Compare 6 7 1 keyinfo(1,BINARY) 00 10 Jump 11 15 11 00 11 Move 7 6 1 00 12 Gosub 4 23 0 00 13 IfPos 3 30 0 00 14 Gosub 5 28 0 00 15 Column 2 0 1 00 16 Integer 1 2 0 00 17 Next 2 8 0 00 18 Close 2 0 0 00 19 Gosub 4 23 0 00 20 Goto 0 30 0 00 21 Integer 1 3 0 00 22 Return 4 0 0 00 23 IfPos 2 25 0 00 24 Return 4 0 0 00 25 Copy 1 9 0 00 26 ResultRow 9 1 0 00 27 Return 4 0 0 00 28 Null 0 1 0 00 29 Return 5 0 0 00 30 Halt 0 0 0 00 31 Transaction 0 0 0 00 32 VerifyCookie 0 2 0 00 33 TableLock 0 2 0 tags 00 34 Goto 0 6 0 00 sqlite> explain select count(*) from tags; addr opcode p1 p2 p3 p4 p5 comment ---- ------------- ---- ---- ---- ------------- -- ------------- 0 Trace 0 0 0 00 1 Goto 0 8 0 00 2 OpenRead 1 4 0 keyinfo(1,BINARY) 00 3 Count 1 1 0 00 4 Close 1 0 0 00 5 Copy 1 2 0 00 6 ResultRow 2 1 0 00 7 Halt 0 0 0 00 8 Transaction 0 0 0 00 9 VerifyCookie 0 2 0 00 10 TableLock 0 2 0 tags 00 11 Goto 0 2 0 00 sqlite> explain select count(distinct(tag)) from tags; addr opcode p1 p2 p3 p4 p5 comment ---- ------------- ---- ---- ---- ------------- -- ------------- 0 Trace 0 0 0 00 1 Null 0 2 0 00 2 Null 0 1 0 00 3 OpenEphemeral 1 0 0 keyinfo(1,BINARY) 00 4 Goto 0 18 0 00 5 OpenRead 0 2 0 2 00 6 Rewind 0 13 0 00 7 Column 0 1 3 00 8 Found 1 12 3 1 00 9 MakeRecord 3 1 4 00 10 IdxInsert 1 4 0 00 11 AggStep 0 3 1 count(1) 01 12 Next 0 7 0 01 13 Close 0 0 0 00 14 AggFinal 1 1 0 count(1) 00 15 Copy 1 5 0 00 16 ResultRow 5 1 0 00 17 Halt 0 0 0 00 18 Transaction 0 0 0 00 19 VerifyCookie 0 2 0 00 20 TableLock 0 2 0 tags 00 21 Goto 0 5 0 00 Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate ________________________________________ From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Yuzem [naujnit...@gmail.com] Sent: Monday, February 07, 2011 2:13 PM To: sqlite-users@sqlite.org Subject: Re: [sqlite] EXT : Speed up count(distinct col) Black, Michael (IS) wrote: > > Test#1 > create index tags_index on tags(tag); > You should have an index for any fields you query on like this. > Thanks Michael but I don't see any speed improvement: create index test on tags(tag); select count(distinct tag) from tags; This is much faster: select count(*) from tags; Am I doing something wrong? -- View this message in context: http://old.nabble.com/Speed-up-count%28distinct-col%29-tp30864622p30867275.html Sent from the SQLite mailing list archive at Nabble.com. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users