Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

D. Richard Hipp Fri, 26 Sep 2008 09:38:49 -0700

On Sep 25, 2008, at 10:37 PM, Alex Scotti wrote:
>
>
> read http://www.ibm.com/developerworks/db2/library/tips/dm-0312yip/
> index.html as well if you could.
>
>
> i implore you all to take the high road here.



I agree with philosophy expressed at the link above: "If [the RDBMS]  
does not choose the optimal access plan and the poor access plan is  
not due to a limitation inherent in the query, we consider it a defect  
in the product and prefer to fix the problem at the source so that  
all ... users may benefit ...."  The intent of the INDEXED BY syntax  
in SQLite is *not* to provide users a way to tune queries.  The  
problem we are attempting to solve is one of detecting a performance  
regression due to coding erors. Here are the two problem reports from  
real SQLite users that we are trying to resolve:

(1)  A workstation application uses SQLite as its database file  
format.  During a product enhancement, one of the indices in the  
schema was deleted by mistake, which caused certain queries to run  
much slower.  But the slowdown was not detected during testing because  
very large application files were necessary for the slowdown to  
appear.  The vendor asks for some way to detect during testing that  
the query plan has changed.

(2) A web portal uses SQLite databases to store per-customer state  
information - one SQLite database per customer.  These millions of  
SQLite database are stored on a network filesystem.  Access must be  
efficient in order to prevent the file servers from being  
overwhelmed.  If a critical index is deleted by mistake, the  
applications will still work fine during testing (because SQLite will  
just use a different query plan) but might buckle under real-world  
loads.  There is a significant chance that the problem will not be  
detected until the upgrade is moved into production and millions of  
users start pounding on it all at once.  The vendor lives in terror of  
this scenario and would like a way to detect the query plan change  
early - before a large scale rollout and subsequent disruption of  
their service.

The MySQL, Oracle, and MSSQL hinting solutions are not applicable to  
the above problems because they are only hints.  If the hints cannot  
be satisfied, the query plan silently reverts to something else.  But  
in my proposed INDEXED BY clause, if the query plan specified by the  
INDEXED BY clause cannot be used, then the query fails with an error.   
This allows developers to detect problems in a few critical queries  
early, before a large rollout.  To put it another way, the INDEXED BY  
clause is more like a CHECK constraint than a hint - only that the  
constraint applies to the query plan instead of the database content.

My original idea on how to solve the problems above was to provide  
some new API that returned an estimate of the performance for a  
prepared statement.  Then an index change that caused (for example) an  
O(logN) to O(N) performance regression could be detected using the new  
API.  That sounds like a good approach upon first hearing, but as we  
have dug deeper, we have uncovered many subtleties that make it much  
less appealing.  The INDEXED BY clause, in contrast, is simple,  
direct, and gets the job done with remarkably little fuss.



D. Richard Hipp
[EMAIL PROTECTED]



_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

Reply via email to