Dr Hipp I second Kyan's suggestion of a pseudo-comment hinting syntax:
... that the planner hint is not interleaved inside normal SQL > syntax. Instead I propose a special comment-like syntax instead, as > Oracle's /*+ */ or --+, but replacing "+" with another symbol, e.g. ">": > Having had to switch between databases a couple of times, I would prefer something that keeps the SQL more portable. As for names I would vote for the 'seldom()' option as the 'likely()' option is already covered by the default action. John Gillespie On 10 September 2013 20:26, Richard Hipp <d...@sqlite.org> wrote: > There is a survey question at the bottom of this message. But first some > context... > > Over on the sqlite-dev mailing list, a debate has been going on about the > best way to provide some useful hints to the query planner. The query > under discussion looks like this: > > SELECT DISTINCT aname > FROM album, composer, track > WHERE cname LIKE '%bach%' > AND composer.cid=track.cid > AND album.aid=track.aid; > > Assuming that the schema has appropriate indices and ANALYZE has been run, > SQLite does a good job of selecting an efficient query plan for the above. > But the query planner lacks a key piece of information that could help it > to do a better job. In particular, the query planner does not know how > often the subexpression "cname LIKE '%bach%'" will be true. But, it turns > out, the best query plan depends critically on this one fact. > > By default, the query planner (in SQLite 3.8.0) assumes that a > subexpression that cannot use an index will always be true. Probably this > will be tweaked in 3.8.1 so that such subexpressions will be assumed to > usually, but not always, be true. Either way, it would be useful to be > able to convey to the query planner the other extreme - that a > subexpression is usually not true. > > (Pedantic detail: "not true" is not the same as "false" in SQL because > NULL is neither true nor false.) > > There is currently code in a branch that provides a hinting mechanism using > a magic "unlikely()" function. Subexpressions contained within > "unlikely()" are assumed to usually not be true. Other than this hint to > the query planner, the unlikely() function is a complete no-op and > optimized out of the VDBE code so that it does not consume any CPU cycles. > The only purpose of the unlikely() function is to let the query planner > know that the subexpression contained in its argument is not commonly > true. So, if an application developer knows that the string "bach" seldom > occurs in composer names, then she might rewrite the query like this: > > SELECT DISTINCT aname > FROM album, composer, track > WHERE unlikely(cname LIKE '%bach%') > AND composer.cid=track.cid > AND album.aid=track.aid; > > The query planner might use this "likelihood" hint to choose a different > query plan that works better when the subexpression is commonly false. Or > it might decide that the original query plan was good enough and ignore the > hint. The query planner gets to make that decision. The application > developer is not telling the query planner what to do. The application > developer has merely provided a small amount of meta-information about the > likelihood of the subexpression being true, meta-information which the > query planner may or may not use. > > Note that the subexpression does not have to be a LIKE operator. > PostgreSQL, to name one example, estimates how often a LIKE operator will > be true based on the pattern on its right-hand side, and adjust query plans > accordingly, and some have argued for this sort of thing in SQLite. But I > want a more general solution. Suppose the subexpression involves one or > more calls to application-defined functions about which the query planner > cannot possible know anything. A general mechanism for letting the query > planner know that subexpressions are commonly not true is what is desired - > not a technique for making LIKE operators more efficient. > > SURVEY QUESTION: > > The question for today is what to call this magic hint function: > > (1) unlikely(EXPR) > (2) selective(EXPR) > (3) seldom(EXPR) > (4) seldom_true(EXPR) > (5) usually_not_true(EXPR) > > Please feel free to suggest other names if you think of any. > > ADDITIONAL INFORMATION: > > The current implementation allows a second argument which must be a > floating point constant between 0.0 and 1.0, inclusive. The second argument > is an estimate of the probability that the expression in the first argument > will be true. The default is 0.05. Names like "unlikely" or "seldom" work > well when this probability is small, but if the second argument is close to > 1.0, then those names seem backwards. I don't know if this matters. The > optional second argument is not guaranteed to make it into an actually > release. > -- > D. Richard Hipp > d...@sqlite.org > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users