> Mabye one of our experts can explain why distinct takes to long. > Seems to me if you have an index you're just returning the values in the > index, aren't you?
If index is unique then you just count number of values in the index and that's it. If index is not unique then it contains many non-unique values. So you have to traverse all values and compare each of them with the previous one to see whether it's different or not. If you remove word 'distinct' then it will indeed just return values from the index. Pavel On Mon, Feb 7, 2011 at 4:10 PM, Black, Michael (IS) <michael.bla...@ngc.com> wrote: > Mabye one of our experts can explain why distinct takes to long. > Seems to me if you have an index you're just returning the values in the > index, aren't you? > But it looks like it's comparing all keys to all keys. > > Do you just want a count of distinct values? > select count(distinct(tag)) from tags; > I think that will run faster. > > Then again...if you normalize your tags you end up with just count(*) from > the new tag table to find all unique keys > sqlite> create index tags_index on tags(tag); > sqlite> explain select distinct(tag) from tags; > addr opcode p1 p2 p3 p4 p5 comment > ---- ------------- ---- ---- ---- ------------- -- ------------- > 0 Trace 0 0 0 00 > 1 Noop 0 0 0 00 > 2 Integer 0 3 0 00 > 3 Integer 0 2 0 00 > 4 Gosub 5 28 0 00 > 5 Goto 0 31 0 00 > 6 OpenRead 2 4 0 keyinfo(1,BINARY) 00 > 7 Rewind 2 18 8 0 00 > 8 Column 2 0 7 00 > 9 Compare 6 7 1 keyinfo(1,BINARY) 00 > 10 Jump 11 15 11 00 > 11 Move 7 6 1 00 > 12 Gosub 4 23 0 00 > 13 IfPos 3 30 0 00 > 14 Gosub 5 28 0 00 > 15 Column 2 0 1 00 > 16 Integer 1 2 0 00 > 17 Next 2 8 0 00 > 18 Close 2 0 0 00 > 19 Gosub 4 23 0 00 > 20 Goto 0 30 0 00 > 21 Integer 1 3 0 00 > 22 Return 4 0 0 00 > 23 IfPos 2 25 0 00 > 24 Return 4 0 0 00 > 25 Copy 1 9 0 00 > 26 ResultRow 9 1 0 00 > 27 Return 4 0 0 00 > 28 Null 0 1 0 00 > 29 Return 5 0 0 00 > 30 Halt 0 0 0 00 > 31 Transaction 0 0 0 00 > 32 VerifyCookie 0 2 0 00 > 33 TableLock 0 2 0 tags 00 > 34 Goto 0 6 0 00 > sqlite> explain select count(*) from tags; > addr opcode p1 p2 p3 p4 p5 comment > ---- ------------- ---- ---- ---- ------------- -- ------------- > 0 Trace 0 0 0 00 > 1 Goto 0 8 0 00 > 2 OpenRead 1 4 0 keyinfo(1,BINARY) 00 > 3 Count 1 1 0 00 > 4 Close 1 0 0 00 > 5 Copy 1 2 0 00 > 6 ResultRow 2 1 0 00 > 7 Halt 0 0 0 00 > 8 Transaction 0 0 0 00 > 9 VerifyCookie 0 2 0 00 > 10 TableLock 0 2 0 tags 00 > 11 Goto 0 2 0 00 > sqlite> explain select count(distinct(tag)) from tags; > addr opcode p1 p2 p3 p4 p5 comment > ---- ------------- ---- ---- ---- ------------- -- ------------- > 0 Trace 0 0 0 00 > 1 Null 0 2 0 00 > 2 Null 0 1 0 00 > 3 OpenEphemeral 1 0 0 keyinfo(1,BINARY) 00 > 4 Goto 0 18 0 00 > 5 OpenRead 0 2 0 2 00 > 6 Rewind 0 13 0 00 > 7 Column 0 1 3 00 > 8 Found 1 12 3 1 00 > 9 MakeRecord 3 1 4 00 > 10 IdxInsert 1 4 0 00 > 11 AggStep 0 3 1 count(1) 01 > 12 Next 0 7 0 01 > 13 Close 0 0 0 00 > 14 AggFinal 1 1 0 count(1) 00 > 15 Copy 1 5 0 00 > 16 ResultRow 5 1 0 00 > 17 Halt 0 0 0 00 > 18 Transaction 0 0 0 00 > 19 VerifyCookie 0 2 0 00 > 20 TableLock 0 2 0 tags 00 > 21 Goto 0 5 0 00 > > Michael D. Black > Senior Scientist > NG Information Systems > Advanced Analytics Directorate > > > > ________________________________________ > From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on > behalf of Yuzem [naujnit...@gmail.com] > Sent: Monday, February 07, 2011 2:13 PM > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] EXT : Speed up count(distinct col) > > Black, Michael (IS) wrote: >> >> Test#1 >> create index tags_index on tags(tag); >> You should have an index for any fields you query on like this. >> > > Thanks Michael but I don't see any speed improvement: > create index test on tags(tag); > select count(distinct tag) from tags; > > This is much faster: > select count(*) from tags; > > Am I doing something wrong? > -- > View this message in context: > http://old.nabble.com/Speed-up-count%28distinct-col%29-tp30864622p30867275.html > Sent from the SQLite mailing list archive at Nabble.com. > > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users