Kasper Daniel Hansen <[EMAIL PROTECTED]> wrote:
> I have a table with two variables, say A and B (both integers). The  
> table is rather large - around 2.9 GB on disk. Every combination of  
> (A,B) occurs only once. I am creating a unique index as
>    CREATE UNIQUE INDEX ABidx ON abtable (A,B)
> It seems that the (A,B) index is created much slower than the (B,A)  
> index. I am wondering about the reason for this. My - very limited -  
> understanding was that sqlite needs to search over the whole database  
> for every (A,B) combination. Could someone give a quick pointer to  
> where the index strategy is described? Would this be quicker if I fit  
> the whole database into memory?
> 

Creating an index on A,B is equivalent to sorting on A,B.

The sorting algorithm currently used by SQLite requires
O(NlogN) comparisons, which is optimial.  But it also requires
O(N) disk seeks, which is very suboptimal.  You don't notice
all these seeks if your database fits in cache.  But when you
get into databases of about 3GB, the seeking really slows you
down.

A project on our to-do list is to implement a new sorter
that uses O(1) seeks.  We know how to do this.  It is just
finding time to do the implementation.

If creating an index on B,A is much faster than creating an
index on A,B, that probably means that B,A is initially closer
to being in sorted order than A,B is.  The initial order of the
entries does not effect the number of comparisons in a sort,
but it does reduce the number of seeks if the values are 
initially close to being sorted.

--
D. Richard Hipp <[EMAIL PROTECTED]>


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to