(6) maybe(EXPR) -----Ursprüngliche Nachricht----- Von: Richard Hipp [mailto:d...@sqlite.org] Gesendet: Dienstag, 10. September 2013 21:27 An: General Discussion of SQLite Database Betreff: [sqlite] Hints for the query planner
There is a survey question at the bottom of this message. But first some context... Over on the sqlite-dev mailing list, a debate has been going on about the best way to provide some useful hints to the query planner. The query under discussion looks like this: SELECT DISTINCT aname FROM album, composer, track WHERE cname LIKE '%bach%' AND composer.cid=track.cid AND album.aid=track.aid; Assuming that the schema has appropriate indices and ANALYZE has been run, SQLite does a good job of selecting an efficient query plan for the above. But the query planner lacks a key piece of information that could help it to do a better job. In particular, the query planner does not know how often the subexpression "cname LIKE '%bach%'" will be true. But, it turns out, the best query plan depends critically on this one fact. By default, the query planner (in SQLite 3.8.0) assumes that a subexpression that cannot use an index will always be true. Probably this will be tweaked in 3.8.1 so that such subexpressions will be assumed to usually, but not always, be true. Either way, it would be useful to be able to convey to the query planner the other extreme - that a subexpression is usually not true. (Pedantic detail: "not true" is not the same as "false" in SQL because NULL is neither true nor false.) There is currently code in a branch that provides a hinting mechanism using a magic "unlikely()" function. Subexpressions contained within "unlikely()" are assumed to usually not be true. Other than this hint to the query planner, the unlikely() function is a complete no-op and optimized out of the VDBE code so that it does not consume any CPU cycles. The only purpose of the unlikely() function is to let the query planner know that the subexpression contained in its argument is not commonly true. So, if an application developer knows that the string "bach" seldom occurs in composer names, then she might rewrite the query like this: SELECT DISTINCT aname FROM album, composer, track WHERE unlikely(cname LIKE '%bach%') AND composer.cid=track.cid AND album.aid=track.aid; The query planner might use this "likelihood" hint to choose a different query plan that works better when the subexpression is commonly false. Or it might decide that the original query plan was good enough and ignore the hint. The query planner gets to make that decision. The application developer is not telling the query planner what to do. The application developer has merely provided a small amount of meta-information about the likelihood of the subexpression being true, meta-information which the query planner may or may not use. Note that the subexpression does not have to be a LIKE operator. PostgreSQL, to name one example, estimates how often a LIKE operator will be true based on the pattern on its right-hand side, and adjust query plans accordingly, and some have argued for this sort of thing in SQLite. But I want a more general solution. Suppose the subexpression involves one or more calls to application-defined functions about which the query planner cannot possible know anything. A general mechanism for letting the query planner know that subexpressions are commonly not true is what is desired - not a technique for making LIKE operators more efficient. SURVEY QUESTION: The question for today is what to call this magic hint function: (1) unlikely(EXPR) (2) selective(EXPR) (3) seldom(EXPR) (4) seldom_true(EXPR) (5) usually_not_true(EXPR) Please feel free to suggest other names if you think of any. ADDITIONAL INFORMATION: The current implementation allows a second argument which must be a floating point constant between 0.0 and 1.0, inclusive. The second argument is an estimate of the probability that the expression in the first argument will be true. The default is 0.05. Names like "unlikely" or "seldom" work well when this probability is small, but if the second argument is close to 1.0, then those names seem backwards. I don't know if this matters. The optional second argument is not guaranteed to make it into an actually release. -- D. Richard Hipp d...@sqlite.org _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -------------------------------------------------------------------------- Gunter Hick Software Engineer Scientific Games International GmbH Klitschgasse 2 – 4, A - 1130 Vienna, Austria FN 157284 a, HG Wien Tel: +43 1 80100 0 E-Mail: h...@scigames.at This e-mail is confidential and may well also be legally privileged. If you have received it in error, you are on notice as to its status and accordingly please notify us immediately by reply e-mail and then delete this message from your system. Please do not copy it or use it for any purposes, or disclose its contents to any person as to do so could be a breach of confidence. Thank you for your cooperation. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users