----- Original Message ----- From: "Hick Gunter" <h...@scigames.at>
To: "'General Discussion of SQLite Database'" <sqlite-users@sqlite.org>
Sent: Monday, December 15, 2014 2:40 PM
Subject: Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used


I would concur in that SQLite is asking "which subset of the given constraints yields the most efficient access".

The possible query plans are

1) A() -> B(ID) -> C(LINKID)

2) C() -> B(LINKID) -> A(ID)

3) B() -> A(ID) + C(LINKID) or B() -> C(LINKID) + A(ID)

4) A() -> C() -> B(ID,LINKID) or C() -> A() -> B(ID,LINKID)

Assuming unique keys in A and C and cardinalities of a, b and c we have estimated costs (in # of records retrieved):

1) a + a*b/a + a*b/a*1 = a + 2b

2) c + c*b/c + c*b/c*1 = c + 2b

3) b + b*1 + b*1 = 3b

4) a + a*c + a*c*b/a/c = a + a*c + b (resp. c + a*c + b)

So which is the smallest cost?

We know that b <= a*c, which makes query plan 4 at least as expensive as plans 1 or 2 respectively.

Choosing between plans 1 and 2 means starting with the smaller of the two tables (assume a < c).

So how do plans 1 and 3 compare? Plan 3 is better only for very sparse link tables where b < a < c is true.


The reasoning is absolutely correct.

I forgot to mention, that most often the query has a where constraint (generated at runtime) limiting A or C to a small subset. In that scenario the smallest cost would be 1) or 2) respectively.

There is one more problem with plan 4): Assuming A and/or B and/or C has many records (100.000s) the query B(ID,LINKID) is executed for each and every record in A*C - if there is an constant cost for each B(ID,LINKID) execution then 4) is the most expensive option.

Joe
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to