On Mon, Mar 01, 2010 at 06:03:49PM +0100, Jean-Christophe Deschamps scratched 
on the wall:
> 
> >I totally disagree with you. Let's say you have 1,000,000 rows and 100
> >of them contain NULL. In this situation selecting NOT NULL will select
> >almost all rows which means that using index in this case doesn't give
> >any performance boost. So here using full scan for NOT NULL condition
> >is better and for NULL condition using index is better. Everything is
> >completely legitimate.

> The number of rows concerned either way is not the point. 

  Unless you have stats to prove otherwise, the optimizer will
  generally assume you're trying to select a smaller subset.

  And SQLite's stats get weird when a lot of NULLs are involved, as it
  sees NULLs as unique values.  That's correct for 3VL, but not so much
  for calculating the uniqueness of a column.

> Also the 
> NULL condition could be almost anything else.  Situations where the 1M 
> and 100 figures are reversed would invalidate your point, so?

  No, you'd still need/want a table scan (see below), but for different
  reasons.

> select * from T where col1 like 'abc%' and col2 between 3 and 18;
> uses the index on T
> 
> select * from T where col1 NOT like 'abc%' and col2 between 1 and 24;
> doesn't, due to the use of NOT.
> 
> That in some particular situation using an index could be better or 
> worse is again not the point.  The point was that NOT <condition> is 
> simply reversing the issue of a binary test, in fine, and that seems 
> essentially independant of the use of an index for determining <condition>.

  You can't use an index to efficiently find something that "isn't."

  Indexes are optimized to find specific values.  You have to know what
  you're looking for before you start.  If you need to do a full scan,
  you should just use the regular table structure, not the index.
  Most indexes have a very high overhead in returning rows.  That's why
  they're only useful if they target a very small percentage of rows
  (typically less than 15%).

  In the case of "not" operators, you don't know what you're looking
  for.  In the case of NOTs, you can't use an index to find what you want,
  only to find what you don't want, so that's not useful.

  For example, if I hand you a phone book and tell you to find all the
  phone numbers for everyone with a lastname that *doesn't* start with
  'X', you can use the phone book "index" to quickly find those numbers
  you don't want.

  But that isn't useful unless you do a full scan to generate the pool
  of numbers you might want, and then subtract out the numbers you
  don't want.  In other words, you need to do a full data scan.  Once
  that's the case, it is faster to generate the list of return values
  with a full scan, testing and rejecting as you do the scan.

  So indexes are not used for NOT conditions, as NOT conditions
  generally require a full scan, regardless.  Yes, it is a simple
  reverse of a binary test, but the reverse of a specific indexed
  lookup of a known value is a table scan to gather all the unknown
  values.

> >BTW, when SQLite uses index on "text" field it needs to load full
> >values of "text" field anyway.
> 
> Do you mean that
>      select * from T where rowid = 1;
> needs loading the _entire_ index when T has rowids in 1..10000000 ?

  No.  
  
  But a row is a row and a value is a value.  It does not matter if it
  is a NULL or a TEXT data type.  If you want to test a column value,
  you need to load it.  So the question of "Why load a TEXT value to
  test for NULL?" is an odd one, since you clearly can't test something
  you don't know.

  Along with that, SQLite indexes use full data copies.  So an index of
  a column full of TEXT values has a full copy of the TEXT values.  So
  loading is still loading.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to