[ 
https://issues.apache.org/jira/browse/IMPALA-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned IMPALA-7831:
-----------------------------------

    Assignee:     (was: Paul Rogers)

> Revisit expression rewriting integration with planner
> -----------------------------------------------------
>
>                 Key: IMPALA-7831
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7831
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 3.0
>            Reporter: Paul Rogers
>            Priority: Major
>
> The planner performs expression rewriting. It appears that the rewrite engine 
> was added late in planner development, as an add-on step in 
> {{AnalysisContext}} after we create the plan. Since that time, it appears 
> that a number of fixes and patches have been applied to work around the 
> inevitable bugs that resulted from this placement of the logic.
> At present, the planner flow, with rewrites, is:
>  * Analyze the entire query
>  * Assign WHERE clause "conjuncts" to scan nodes, etc.
>  * Cerate theĀ full plan
>  * Rewrite the SELECT, WHERE, HAVING and GROUP BY clauses
>  * Throw away the plan create above and create a new one
> This ticket proposes to adjust the flow to incorporate rewrites earlier in 
> the process, allowing the planner to make a single pass over the query. 
> (Which will solve a number of bugs described in associated tickets.)
> h4. Background
> The above logic evolved because of a timing issue: once we assign conjuncts, 
> we have plan nodes that point to the original WHERE clause expressions. We 
> later rewrite these, but we do so by throwing away the original nodes, 
> replacing them with new ones. Since the scan and other nodes still have a 
> pointer to the old version, the rewrites can have no effect.
> To work around this, the code throws away that original plan and replans 
> using the new, rewritten nodes.
> This then creates an interesting issue. We do the full analysis (and plan) 
> because we need the column bindings in order to do the rewrite. Since 
> plan/analysis is implemented as a single black box, rewrites can't be done 
> before planning (no column binding yet) so must be done after (column 
> bindings available, but so is the entire plan.)
> Some expression nodes have incomplete implementations. For example, {{X 
> BETWEEN Y AND Z}} does not compute a cost (because it is a "virtual" node: it 
> does not exist at run time, having been rewritten to {{Y <= X AND X <= Z}}.) 
> This means that, not only do we throw away the first plan, that first plan 
> was actually wrong: it used incomplete information.
> Thus, in order to get the semantic info needed for rewrites (column 
> bindings), we end up creating an entire plan which we must then discard and 
> rebuild after doing the rewrites (so the planner has the full information.)
> h4. Alternative
> The alternative approach is to integrate expression rewrites into the planner 
> process, rather than doing them from the outside so that we make only a 
> single pass through the planner. In particular:
>  * Analyze expressions to create column bindings.
>  * Match up SELECT and GROUP BY and other expressions (if required.) GROUP BY 
> points to a SELECT clause node (so it will see rewrites) rather than each 
> SELECT expression (which will be discarded.)
>  * Rewrite SELECT and WHERE clause expressions. (Bound GROUP BY expressions 
> will see the rewrites.)
>  * Complete the plan as today.
> With this approach, we plan only once, and that plan has a full set of cost 
> information based on the rewritten expressions which the BE will execute.
> The purpose of this ticket is to track this analysis and to later propose a 
> detailed fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to