> So indexes are not used for NOT conditions, as NOT conditions
> generally require a full scan, regardless. Yes, it is a simple
> reverse of a binary test, but the reverse of a specific indexed
> lookup of a known value is a table scan to gather all the unknown
> values.
Jay,
I understand how your argument goes, but I still have difficulty buying
it. Why not negate the condition itself and then only proceed?
Under any condition K, a table T is the union of the subset X of
elements that match the condition K and the subset Y of elements that
don't. In other words, Y is the subset whose elements match [not
K]. In SQL there is of course the special case for nulls.
You're saying that selecting all rows in T that match K is OK for
indexing, while selecting rows in T which match the negated condition
K' = not K can only be done with a full scan.
That is true iff the complementation is done afterwards or on the fly
(full scan). My point is that NOT seems to be acting as a
complementation operator: select X then full scan and check every row
if it belongs to X or not.
I'd rather see the negated condition K' (= not K) as the condition used
for indexed selection, again when using the 'complementary' condition
makes sense.
Pavel modified simple example works well here:
condition " ... > 3" OK for index search
condition " ... <= 3" OK for index search
condition "NOT ... > 3" only full scan: why?
I still believe that it's possible that many simple conditions be
negated and used negated for index search.
Now, that it would easy to implement in the current SQLite code is
another matter entirely and I never pretended it could be done within
minutes.
The actual reason for the way NOT works as for now may be due to the
fact that negating a condition may cause the resulting set to be in
fact itself the union of two subsets.
Say the "where" condition K is "col = 12345". We can see the index
split into not less than four subsets:
o) N = {rows with col is null}
o) X = {rows with col < 12345}
o) Y = {rows with col = 12345}
o) Z = {rows with col > 12345}
For the "positive" condition K, the resulset is Y.
For the negated condition K' (col <> 12345) the resulset is N u X u Z,
made of two (or three ?) distinct blocks of indexed rowids. A possible
explanation is that the current code may be unable to cope with
situations where the resultset is not index-wise contiguous.
The OP request can be written in a number of ways to make use of the
index, even:
select count(*) from mybigtable where mytextcolumn < '' or
mytextcolumn >= '';
so the question is not if SQLite can process equivalent count/querries
efficiently or not. I nonetheless think "is not null" is much clearer
than the above workaround.
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users