I was thinking about this issue some time ago, and it occurred to me that the root node of a Btree is a great estimator of the contents of the tree. Some brief mucking around proves this to me empirically, and I'm guessing you could prove that it is never out by more than 2x, due to the balancing properties of a Btree. I haven't even looked at the Sqlite source to tell whether it uses this info, but that's where I would begin if I needed such an optimization. Basically, if you have NULLs in a ratio of 9:1, and NULLs sort first, then your root node would be filled with NULLs and non-NULLs in that proportion. A simple count of the number of NULLs in the root node, divided by the number of elements in that node, gives you an estimate for the NULL density. Sorry for suggesting instead of doing.. I'm just too lazy right now to try coding it up.
Ben On Thu, Apr 10, 2008 at 8:06 PM, D. Richard Hipp <[EMAIL PROTECTED]> wrote: > > On Apr 10, 2008, at 1:30 PM, Steve Krulewitz wrote: > > > > This problem also affects the table ordering of my joins. Taking the > > same schema and adding a "detail" table (see > > http://skrul.com/index_detail.sh) , this query chooses idx_a and puts > > the foo table first: > > > > sqlite> explain query plan select * from foo join foo_detail on foo.id > > = foo_detail.id where (a is null) and data = 1; > > 0|0|TABLE foo WITH INDEX idx_a > > 1|1|TABLE foo_detail WITH INDEX idx_foo_detail_data > > > > And it runs very very _very_ slowly. Forcing sqlite not to use > > idx_a, you get: > > > > sqlite> explain query plan select * from foo join foo_detail on foo.id > > = foo_detail.id where (a is null or a is not null) and data = 1; > > 0|1|TABLE foo_detail WITH INDEX idx_foo_detail_data > > 1|0|TABLE foo USING PRIMARY KEY > > > > And it runs brilliantly fast as you'd expect. > > > > For now I've added the "or" hack into my application (thanks, Ken) and > > this has sped things up considerably. Is there any better way to fix > > this? > > > > Use "+a is null" instead of "a is null". (Add a "+" in front of the > column name). The plus sign is a no-op - it generates no code > and returns the value of its operand unchanged even if the value > is a string. But it also disables the term as a candidate for using > indices. > > > D. Richard Hipp > [EMAIL PROTECTED] > > > > > > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users