On 23/01/2012 12:48 PM, Simon Slavin wrote:
On 23 Jan 2012, at 4:30pm, Ryan Johnson wrote:

Bump?

On 21/01/2012 2:47 PM, Ryan Johnson wrote:
On 21/01/2012 2:44 PM, Simon Slavin wrote:
On 21 Jan 2012, at 7:23pm, Ryan Johnson wrote:

It's a SQL89 join. Sqlite docs say SQL89 and SQL92 joins perform identically 
[1], which I confirmed before sending the OP.
Oh.  Okay.  If it spits out the same EXPLAIN QUERY PLAN then SQLite is 
interpreting it the same way.
Which brings us back to the original question: why does sqlite spit out a bad 
query plan when a vastly better one exists? There's no clear reason the better 
answer should have been hard to find.
Well, the approach would be to find a specific query, post the query and query 
plan, then to simplify the query until the strangeness in the query plan goes 
away.  The last simplification is the one that caused the problem.

On 21/01/2012 12:49 PM, Ryan Johnson wrote:
consider the following query:

select count(*) from orders O, Customer C where C.custkey=O.custkey and C.name like '%115';

.stats/explain reports 149999 fullscan steps for the query plan:
0|0|TABLE orders AS O
1|1|TABLE Customer AS C USING PRIMARY KEY

Putting Customer first in the FROM clause makes the query markedly faster and executes only 14999 fullscan steps. The query plan confirms the change:
0|0|TABLE Customer AS C
1|1|TABLE orders AS O WITH INDEX OrderCustomers

In what way does the above query from the OP not sufficient to demonstrate the problem? There are only two tables involved. There are only two relevant indexes -- the automatically-created PK and manually created index the query *should* use; I don't think it's reasonable to suggest dropping the primary key index. Changing "count(*)" to "*" doesn't change the chosen access path for the query. Removing the predicate doesn't affect the access path either, thought it does increase by 1000x the number of rows returned (runtime stays about the same).

Given that ANALYZE ran and established |Customer|=15k vs. |Orders|=150k, why does the optimizer choose to do a full table scan of Orders, instead of scanning Customer and accessing Orders using the index created specifically for that purpose? (It's even marked as a foreign key in the schema, though I'm pretty sure sqlite doesn't pay attention to that). This seems a cut-and-dried join reordering scenario that the optimizer should eat for lunch, and instead it seems to limit its search to selecting the best index(es) to use given the join ordering the original SQL specified. If the optimizer were merely making a bad decision -- which the second example of the OP seems to demonstrate -- then it would reorder (wrongly) the "correct" version of the SQL; it does not.

Thoughts?
Ryan

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to