On Thu, Apr 10, 2008 at 10:30:34AM -0700, Steve Krulewitz scratched on the wall:
> On Tue, Apr 8, 2008 at 2:03 PM, Jay A. Kreibich <[EMAIL PROTECTED]> wrote:
> >   The stat table actually hurts in this case, since the stats for both
> >   indexes are the same.  This appears to be because two NULL values
> >   appear to be considered distinct values, so the analysis system sees
> >   column a as being full of unique values.  That makes the "usefulness"
> >   weight of idx_a highly inflated.
> 
> This may indeed be a bug -- thanks for filing it.  However, even after
> manually updating the sqlite_stat1 table with the proper values (as if
> the bug were fixed), the planner still did exactly the same thing.

  That's not a surprise.

  I still think the basic problem is that the query planner assumes any
  "is null" will return a small percentage of rows, and that any "is
  not null" is going to return a high percentage of rows.

  This is somewhat reflected in the code, as it looks like it will try
  to use an index for an "is null" expression (if it exists) but will
  not attempt to use an index for a "is not null" expression.

  In specific, it looks like the code will try to use an index for an
  "is null" expression, but will not use an index for an "is not null"
  expression.  That fits the idea that "is null" is assumed to return
  few rows and "is not null" is assumed to return a large number of rows.

> And it runs very very _very_ slowly.  Forcing sqlite not to use
> idx_a, you get:

  In this case, the index on a column that is mostly NULLs is "worse
  than useless" as it contains a very large number of rows with the
  same value.  Generally, you don't want indexes on columns that that
  have a noticeable percentage of similar values.  Personally I don't
  have a lot of experience with this, but the rule of thumb I've heard
  is that if any single value makes up more than about 15% of the rows
  in a table, you don't want an index on that table-- it is actually
  less expensive to just scan the whole table.

  This, on top of the assumptions made by the query planner, is causing
  great problems for this specific situation.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"'People who live in bamboo houses should not throw pandas.' Jesus said that."
   - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006"
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to