On 2014/05/08 18:16, Woody Wu wrote:
On 2014年5月8日 GMT+08:00PM11:59:33, RSmith <rsm...@rsweb.co.za> wrote:
On 2014/05/08 14:13, Woody Wu wrote:
Do you mean that my index for this case is already the best choice?

And, as a general question, for a query in the form of
    select max(a) from table where a < InF and b=B and c=C.
what's the best index in the case of (1) B is an existed one, or (2)
B is not exists ?

Imagine you have a table like this:

A, B, C
1, 1, 1
2, 1, 2
3, 2, 1
4, 1, 3
5, 1, 1
6, 2, 1
7, 1, 2

etc.


Thanks Smith, you explained quite clear in your way.  I understood that things 
has to be that if sqlite could only work like that.

What I was wondering is that why sqlite couldn't find out earlier that b=8 c=1 denotes an 
empty set? If I run something like "select * from table where b=8 and c=1", it 
could return very quickly. If so, the max(a) on an empty set should alway be Null.  Am I 
thinking right?


-Woody Wu

It makes no sense if there isn't some index defined on b or c or both. If there is, then it is the same happy accident accident looked at in a different way. The reason that the query returns fast if you "Just" do WHERE b=8 AND c=1 is that there is either an automatic or explicit index on column b and/or c - but - as soon as you change the query to include grouping (i.e: max(a)) then that specific index is no longer the best one to use - and you cannot expect SQLite to first run some other arbitrary "test" query to see if it has no results, and only THEN try your whole query after it "knows" whether the other model query has actual results or not. Put another way, you CAN construct an Index that will be VERY fast for queries where there definitely is NO value in the list, consider again the table from earlier:

A, B, C
1, 1, 1
2, 1, 2
3, 2, 1
4, 1, 3
5, 1, 1
6, 2, 1
7, 1, 2

Now add Index (B, C, A) - a covering Index which first lists B and C, then A... 
now if you were to ask of it:

SELECT max(a) from table where a<INF AND b=8 and c=1;

It will start by scanning the new Index for values where b=8... and Not find 
any, so very quickly return with no results.

but now let's say you change the query to:

SELECT max(a) from table where a<INF AND b=1 and c=1;

It starts (as before) scanning the table (assuming it uses the new index - this is not a given) and finds that the very first entry indeed b=1 and c=1.. so it checks a<inf which is also true... nice, but that's only the first check, max(a) requires we scan all other values to find the max of a, at least all values where b=1 and c=1.. of which there might be very very many meaning this query is a LOT slower now than NOT using that index (as you had it in your original post when you get very fast results).

So the new Index is optimized to be quick when the b value DOESN'T exist, but slow when it DOES... and the original index is better to use if the b value DOES exist, but slow when it DOESN'T. Again, there is no best index that will satisfy both those requirements. Simon's advice is still best re making Indices and then running Analyze to see which queries get used, and whether they have good times, then removing the bad ones.

If there is a way you could tell before-hand whether a value exists or not, that would be better, and nothing stoppping you from actually running two queries every time, first check if it exists, then, if it does, run the second query that is faster (as the original). - Actually this might make your unique problem into the fastest possible results, but testing is needed to verify and it isn't a solution you can short-hand off to the Query planner... you need to do two queries.


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to