----- Original Message -----
From: "Hick Gunter" <h...@scigames.at>
To: "'General Discussion of SQLite Database'" <sqlite-users@sqlite.org>
Sent: Monday, December 15, 2014 2:40 PM
Subject: Re: [sqlite] Query Planner for Virtual Tables: link table
evaluation & transitive property of constraints not used
I would concur in that SQLite is asking "which subset of the given
constraints yields the most efficient access".
The possible query plans are
1) A() -> B(ID) -> C(LINKID)
2) C() -> B(LINKID) -> A(ID)
3) B() -> A(ID) + C(LINKID) or B() -> C(LINKID) + A(ID)
4) A() -> C() -> B(ID,LINKID) or C() -> A() -> B(ID,LINKID)
Assuming unique keys in A and C and cardinalities of a, b and c we have
estimated costs (in # of records retrieved):
1) a + a*b/a + a*b/a*1 = a + 2b
2) c + c*b/c + c*b/c*1 = c + 2b
3) b + b*1 + b*1 = 3b
4) a + a*c + a*c*b/a/c = a + a*c + b (resp. c + a*c + b)
So which is the smallest cost?
We know that b <= a*c, which makes query plan 4 at least as expensive as
plans 1 or 2 respectively.
Choosing between plans 1 and 2 means starting with the smaller of the two
tables (assume a < c).
So how do plans 1 and 3 compare? Plan 3 is better only for very sparse
link tables where b < a < c is true.
The reasoning is absolutely correct.
I forgot to mention, that most often the query has a where constraint
(generated at runtime) limiting A or C to a small subset. In that scenario
the smallest cost would be 1) or 2) respectively.
There is one more problem with plan 4): Assuming A and/or B and/or C has
many records (100.000s) the query B(ID,LINKID) is executed for each and
every record in A*C - if there is an constant cost for each B(ID,LINKID)
execution then 4) is the most expensive option.
Joe
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users