Re: [sqlite] Hints for the query planner

Gabor Grothendieck Thu, 12 Sep 2013 05:04:05 -0700

PostgreSQL supports

   create index <indexname> on <table> ( <expression> )


Note that it allows an expression and not just a column name.

See:

  http://www.postgresql.org/docs/9.1/static/sql-createindex.html

Perhaps indexing the expression in question would be an alternative
that would keep the performance info separate from the select.




On Tue, Sep 10, 2013 at 3:26 PM, Richard Hipp <d...@sqlite.org> wrote:
> There is a survey question at the bottom of this message.  But first some
> context...
>
> Over on the sqlite-dev mailing list, a debate has been going on about the
> best way to provide some useful hints to the query planner.  The query
> under discussion looks like this:
>
> SELECT DISTINCT aname
>   FROM album, composer, track
>  WHERE cname LIKE '%bach%'
>    AND composer.cid=track.cid
>    AND album.aid=track.aid;
>
> Assuming that the schema has appropriate indices and ANALYZE has been run,
> SQLite does a good job of selecting an efficient query plan for the above.
> But the query planner lacks a key piece of information that could help it
> to do a better job.  In particular, the query planner does not know how
> often the subexpression "cname LIKE '%bach%'" will be true.  But, it turns
> out, the best query plan depends critically on this one fact.
>
> By default, the query planner (in SQLite 3.8.0) assumes that a
> subexpression that cannot use an index will always be true.  Probably this
> will be tweaked in 3.8.1 so that such subexpressions will be assumed to
> usually, but not always, be true.  Either way, it would be useful to be
> able to convey to the query planner the other extreme - that a
> subexpression is usually not true.
>
> (Pedantic detail:  "not true" is not the same as "false" in SQL because
> NULL is neither true nor false.)
>
> There is currently code in a branch that provides a hinting mechanism using
> a magic "unlikely()" function.  Subexpressions contained within
> "unlikely()" are assumed to usually not be true.  Other than this hint to
> the query planner, the unlikely() function is a complete no-op and
> optimized out of the VDBE code so that it does not consume any CPU cycles.
> The only purpose of the unlikely() function is to let the query planner
> know that the subexpression contained in its argument is not commonly
> true.  So, if an application developer knows that the string "bach" seldom
> occurs in composer names, then she might rewrite the query like this:
>
> SELECT DISTINCT aname
>   FROM album, composer, track
>  WHERE unlikely(cname LIKE '%bach%')
>    AND composer.cid=track.cid
>    AND album.aid=track.aid;
>
> The query planner might use this "likelihood" hint to choose a different
> query plan that works better when the subexpression is commonly false.  Or
> it might decide that the original query plan was good enough and ignore the
> hint.  The query planner gets to make that decision.  The application
> developer is not telling the query planner what to do. The application
> developer has merely provided a small amount of meta-information about the
> likelihood of the subexpression being true, meta-information which the
> query planner may or may not use.
>
> Note that the subexpression does not have to be a LIKE operator.
> PostgreSQL, to name one example, estimates how often a LIKE operator will
> be true based on the pattern on its right-hand side, and adjust query plans
> accordingly, and some have argued for this sort of thing in SQLite.  But I
> want a more general solution.  Suppose the subexpression involves one or
> more calls to application-defined functions about which the query planner
> cannot possible know anything.  A general mechanism for letting the query
> planner know that subexpressions are commonly not true is what is desired -
> not a technique for making LIKE operators more efficient.
>
> SURVEY QUESTION:
>
> The question for today is what to call this magic hint function:
>
> (1)  unlikely(EXPR)
> (2)  selective(EXPR)
> (3)  seldom(EXPR)
> (4)  seldom_true(EXPR)
> (5)  usually_not_true(EXPR)
>
> Please feel free to suggest other names if you think of any.
>
> ADDITIONAL INFORMATION:
>
> The current implementation allows a second argument which must be a
> floating point constant between 0.0 and 1.0, inclusive. The second argument
> is an estimate of the probability that the expression in the first argument
> will be true.  The default is 0.05.  Names like "unlikely" or "seldom" work
> well when this probability is small, but if the second argument is close to
> 1.0, then those names seem backwards.  I don't know if this matters.  The
> optional second argument is not guaranteed to make it into an actually
> release.
> --
> D. Richard Hipp
> d...@sqlite.org
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Hints for the query planner

Reply via email to