Kasper Daniel Hansen <[EMAIL PROTECTED]> wrote: > I have a table with two variables, say A and B (both integers). The > table is rather large - around 2.9 GB on disk. Every combination of > (A,B) occurs only once. I am creating a unique index as > CREATE UNIQUE INDEX ABidx ON abtable (A,B) > It seems that the (A,B) index is created much slower than the (B,A) > index. I am wondering about the reason for this. My - very limited - > understanding was that sqlite needs to search over the whole database > for every (A,B) combination. Could someone give a quick pointer to > where the index strategy is described? Would this be quicker if I fit > the whole database into memory? >
Creating an index on A,B is equivalent to sorting on A,B. The sorting algorithm currently used by SQLite requires O(NlogN) comparisons, which is optimial. But it also requires O(N) disk seeks, which is very suboptimal. You don't notice all these seeks if your database fits in cache. But when you get into databases of about 3GB, the seeking really slows you down. A project on our to-do list is to implement a new sorter that uses O(1) seeks. We know how to do this. It is just finding time to do the implementation. If creating an index on B,A is much faster than creating an index on A,B, that probably means that B,A is initially closer to being in sorted order than A,B is. The initial order of the entries does not effect the number of comparisons in a sort, but it does reduce the number of seeks if the values are initially close to being sorted. -- D. Richard Hipp <[EMAIL PROTECTED]> _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users