Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

Russell Leighton Fri, 26 Sep 2008 10:19:16 -0700

Perfect solution as long as there is a no index option along with  
index by.




On Sep 26, 2008, at 12:54 PM, Russell Leighton <[EMAIL PROTECTED] 
 > wrote:

> I have another scenario where this is needed , the one in the subject.
> I repeated this problem this AM.
>
> I need a 2 key index for some queries and also want to aggregate on
> these 2 columns. I need this index BUT I have many large sqlite dbs I
> iterate over and they won't fit in the filesystem cache. Run time when
> the index is present is 105min. Run time with out the index is 3min.
>
> I see no way a simple query planner can account for factors like
> available ram, disk io speeds and CPU speeds. The solution DRH
> suggests is perfect for my needs.
>
>
>
> On Sep 26, 2008, at 12:38 PM, "D. Richard Hipp" <[EMAIL PROTECTED]> wrote:
>
>>
>> On Sep 25, 2008, at 10:37 PM, Alex Scotti wrote:
>>>
>>>
>>> read http://www.ibm.com/developerworks/db2/library/tips/dm-0312yip/
>>> index.html as well if you could.
>>>
>>>
>>> i implore you all to take the high road here.
>>
>>
>> I agree with philosophy expressed at the link above: "If [the RDBMS]
>> does not choose the optimal access plan and the poor access plan is
>> not due to a limitation inherent in the query, we consider it a  
>> defect
>> in the product and prefer to fix the problem at the source so that
>> all ... users may benefit ...."  The intent of the INDEXED BY syntax
>> in SQLite is *not* to provide users a way to tune queries.  The
>> problem we are attempting to solve is one of detecting a performance
>> regression due to coding erors. Here are the two problem reports from
>> real SQLite users that we are trying to resolve:
>>
>> (1)  A workstation application uses SQLite as its database file
>> format.  During a product enhancement, one of the indices in the
>> schema was deleted by mistake, which caused certain queries to run
>> much slower.  But the slowdown was not detected during testing  
>> because
>> very large application files were necessary for the slowdown to
>> appear.  The vendor asks for some way to detect during testing that
>> the query plan has changed.
>>
>> (2) A web portal uses SQLite databases to store per-customer state
>> information - one SQLite database per customer.  These millions of
>> SQLite database are stored on a network filesystem.  Access must be
>> efficient in order to prevent the file servers from being
>> overwhelmed.  If a critical index is deleted by mistake, the
>> applications will still work fine during testing (because SQLite will
>> just use a different query plan) but might buckle under real-world
>> loads.  There is a significant chance that the problem will not be
>> detected until the upgrade is moved into production and millions of
>> users start pounding on it all at once.  The vendor lives in terror  
>> of
>> this scenario and would like a way to detect the query plan change
>> early - before a large scale rollout and subsequent disruption of
>> their service.
>>
>> The MySQL, Oracle, and MSSQL hinting solutions are not applicable to
>> the above problems because they are only hints.  If the hints cannot
>> be satisfied, the query plan silently reverts to something else.  But
>> in my proposed INDEXED BY clause, if the query plan specified by the
>> INDEXED BY clause cannot be used, then the query fails with an error.
>> This allows developers to detect problems in a few critical queries
>> early, before a large rollout.  To put it another way, the INDEXED BY
>> clause is more like a CHECK constraint than a hint - only that the
>> constraint applies to the query plan instead of the database content.
>>
>> My original idea on how to solve the problems above was to provide
>> some new API that returned an estimate of the performance for a
>> prepared statement.  Then an index change that caused (for example)  
>> an
>> O(logN) to O(N) performance regression could be detected using the  
>> new
>> API.  That sounds like a good approach upon first hearing, but as we
>> have dug deeper, we have uncovered many subtleties that make it much
>> less appealing.  The INDEXED BY clause, in contrast, is simple,
>> direct, and gets the job done with remarkably little fuss.
>>
>>
>>
>> D. Richard Hipp
>> [EMAIL PROTECTED]
>>
>>
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [email protected]
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

Reply via email to