Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

Jean-Christophe Deschamps Mon, 01 Mar 2010 14:45:10 -0800

>   So indexes are not used for NOT conditions, as NOT conditions
>   generally require a full scan, regardless.  Yes, it is a simple
>   reverse of a binary test, but the reverse of a specific indexed
>   lookup of a known value is a table scan to gather all the unknown
>   values.


Jay,

I understand how your argument goes, but I still have difficulty buying 
it.  Why not negate the condition itself and then only proceed?

Under any condition K, a table T is the union of the subset X of 
elements that match the condition K and the subset Y of elements that 
don't.  In other words, Y is the subset whose elements match [not 
K].  In SQL there is of course the special case for nulls.

You're saying that selecting all rows in T that match K is OK for 
indexing, while selecting rows in T which match the negated condition 
K' = not K can only be done with a full scan.

That is true iff the complementation is done afterwards or on the fly 
(full scan).  My point is that NOT seems to be acting as a 
complementation operator: select X then full scan and check every row 
if it belongs to X or not.

I'd rather see the negated condition K' (= not K) as the condition used 
for indexed selection, again when using the 'complementary' condition 
makes sense.

Pavel modified simple example works well here:

condition  "     ... >  3"         OK for index search
condition  "     ... <= 3"         OK for index search
condition  "NOT  ... >  3"         only full scan:     why?

I still believe that it's possible that many simple conditions be 
negated and used negated for index search.

Now, that it would easy to implement in the current SQLite code is 
another matter entirely and I never pretended it could be done within 
minutes.

The actual reason for the way NOT works as for now may be due to the 
fact that negating a condition may cause the resulting set to be in 
fact itself the union of two subsets.
Say the "where" condition K is "col = 12345".  We can see the index 
split into not less than four subsets:
   o) N = {rows with col is null}
   o) X = {rows with col < 12345}
   o) Y = {rows with col = 12345}
   o) Z = {rows with col > 12345}

For the "positive" condition K, the resulset is Y.
For the negated condition K' (col <> 12345) the resulset is N u X u Z, 
made of two (or three ?) distinct blocks of indexed rowids.  A possible 
explanation is that the current code may be unable to cope with 
situations where the resultset is not index-wise contiguous.


The OP request can be written in a number of ways to make use of the 
index, even:
select count(*) from mybigtable where mytextcolumn < '' or 
mytextcolumn >= '';
so the question is not if SQLite can process equivalent count/querries 
efficiently or not.  I nonetheless think "is not null" is much clearer 
than the above workaround. 



_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

Reply via email to