[ https://issues.apache.org/jira/browse/IMPALA-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Rogers reassigned IMPALA-7831: ----------------------------------- Assignee: (was: Paul Rogers) > Revisit expression rewriting integration with planner > ----------------------------------------------------- > > Key: IMPALA-7831 > URL: https://issues.apache.org/jira/browse/IMPALA-7831 > Project: IMPALA > Issue Type: Improvement > Components: Frontend > Affects Versions: Impala 3.0 > Reporter: Paul Rogers > Priority: Major > > The planner performs expression rewriting. It appears that the rewrite engine > was added late in planner development, as an add-on step in > {{AnalysisContext}} after we create the plan. Since that time, it appears > that a number of fixes and patches have been applied to work around the > inevitable bugs that resulted from this placement of the logic. > At present, the planner flow, with rewrites, is: > * Analyze the entire query > * Assign WHERE clause "conjuncts" to scan nodes, etc. > * Cerate theĀ full plan > * Rewrite the SELECT, WHERE, HAVING and GROUP BY clauses > * Throw away the plan create above and create a new one > This ticket proposes to adjust the flow to incorporate rewrites earlier in > the process, allowing the planner to make a single pass over the query. > (Which will solve a number of bugs described in associated tickets.) > h4. Background > The above logic evolved because of a timing issue: once we assign conjuncts, > we have plan nodes that point to the original WHERE clause expressions. We > later rewrite these, but we do so by throwing away the original nodes, > replacing them with new ones. Since the scan and other nodes still have a > pointer to the old version, the rewrites can have no effect. > To work around this, the code throws away that original plan and replans > using the new, rewritten nodes. > This then creates an interesting issue. We do the full analysis (and plan) > because we need the column bindings in order to do the rewrite. Since > plan/analysis is implemented as a single black box, rewrites can't be done > before planning (no column binding yet) so must be done after (column > bindings available, but so is the entire plan.) > Some expression nodes have incomplete implementations. For example, {{X > BETWEEN Y AND Z}} does not compute a cost (because it is a "virtual" node: it > does not exist at run time, having been rewritten to {{Y <= X AND X <= Z}}.) > This means that, not only do we throw away the first plan, that first plan > was actually wrong: it used incomplete information. > Thus, in order to get the semantic info needed for rewrites (column > bindings), we end up creating an entire plan which we must then discard and > rebuild after doing the rewrites (so the planner has the full information.) > h4. Alternative > The alternative approach is to integrate expression rewrites into the planner > process, rather than doing them from the outside so that we make only a > single pass through the planner. In particular: > * Analyze expressions to create column bindings. > * Match up SELECT and GROUP BY and other expressions (if required.) GROUP BY > points to a SELECT clause node (so it will see rewrites) rather than each > SELECT expression (which will be discarded.) > * Rewrite SELECT and WHERE clause expressions. (Bound GROUP BY expressions > will see the rewrites.) > * Complete the plan as today. > With this approach, we plan only once, and that plan has a full set of cost > information based on the rewritten expressions which the BE will execute. > The purpose of this ticket is to track this analysis and to later propose a > detailed fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org