Re: [sqlite] Very slow when query max(col) on empty set?

Woody Wu Sun, 18 May 2014 21:46:32 -0700

On Friday, 9 May 2014, RSmith <[email protected]> wrote:

>
> On 2014/05/08 18:16, Woody Wu wrote:
>
>> On 2014年5月8日 GMT+08:00PM11:59:33, RSmith <[email protected]> wrote:
>>
>>> On 2014/05/08 14:13, Woody Wu wrote:
>>>
>>>> Do you mean that my index for this case is already the best choice?
>>>>
>>>> And, as a general question, for a query in the form of
>>>>     select max(a) from table where a < InF and b=B and c=C.
>>>> what's the best index in the case of (1) B is an existed one, or (2)
>>>>
>>> B is not exists ?
>>>
>>> Imagine you have a table like this:
>>>
>>> A, B, C
>>> 1, 1, 1
>>> 2, 1, 2
>>> 3, 2, 1
>>> 4, 1, 3
>>> 5, 1, 1
>>> 6, 2, 1
>>> 7, 1, 2
>>>
>>> etc.
>>>
>>>
>> Thanks Smith, you explained quite clear in your way.  I understood that
>> things has to be that if sqlite could only work like that.
>>
>> What I was wondering is that why sqlite couldn't find out earlier that
>> b=8 c=1 denotes an empty set? If I run something like "select * from table
>> where b=8 and c=1", it could return very quickly. If so, the max(a) on an
>> empty set should alway be Null.  Am I thinking right?
>>
>>
>> -Woody Wu
>>
>
> It makes no sense if there isn't some index defined on b or c or both. If
> there is, then it is the same happy accident accident looked at in a
> different way. The reason that the query returns fast if you "Just" do
> WHERE b=8 AND c=1 is that there is either an automatic or explicit index on
> column b and/or c - but - as soon as you change the query to include
> grouping (i.e: max(a)) then that specific index is no longer the best one
> to use - and you cannot expect SQLite to first run some other arbitrary
> "test" query to see if it has no results, and only THEN try your whole
> query after it "knows" whether the other model query has actual results or
> not. Put another way, you CAN construct an Index that will be VERY fast for
> queries where there definitely is NO value in the list, consider again the
> table from earlier:
>
> A, B, C
> 1, 1, 1
> 2, 1, 2
> 3, 2, 1
> 4, 1, 3
> 5, 1, 1
> 6, 2, 1
> 7, 1, 2
>
> Now add Index (B, C, A) - a covering Index which first lists B and C, then
> A... now if you were to ask of it:
>
> SELECT max(a) from table where a<INF AND b=8 and c=1;
>
> It will start by scanning the new Index for values where b=8... and Not
> find any, so very quickly return with no results.
>
> but now let's say you change the query to:
>
> SELECT max(a) from table where a<INF AND b=1 and c=1;
>
> It starts (as before) scanning the table (assuming it uses the new index -
> this is not a given) and finds that the very first entry indeed b=1 and
> c=1.. so it checks a<inf which is also true... nice, but that's only the
> first check, max(a) requires we scan all other values to find the max of a,
> at least all values where b=1 and c=1.. of which there might be very very
> many meaning this query is a LOT slower now than NOT using that index (as
> you had it in your original post when you get very fast results).
>
> So the new Index is optimized to be quick when the b value DOESN'T exist,
> but slow when it DOES... and the original index is better to use if the b
> value DOES exist, but slow when it DOESN'T.  Again, there is no best index
> that will satisfy both those requirements. Simon's advice is still best re
> making Indices and then running Analyze to see which queries get used, and
> whether they have good times, then removing the bad ones.
>
> If there is a way you could tell before-hand whether a value exists or
> not, that would be better, and nothing stoppping you from actually running
> two queries every time, first check if it exists, then, if it does, run the
> second query that is faster (as the original). - Actually this might make
> your unique problem into the fastest possible results, but testing is
> needed to verify and it isn't a solution you can short-hand off to the
> Query planner... you need to do two queries.
>
>
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


Simon,

After the explanation, I understand the problem so much better than before.
 I think I would to use the two queries solution.  Thanks a lot!



-- 
Life is the only flaw in an otherwise perfect nonexistence
    -- Schopenhauer

narke
public key at http://subkeys.pgp.net:11371 ([email protected])
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Very slow when query max(col) on empty set?

Reply via email to