[ https://issues.apache.org/jira/browse/IMPALA-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Rogers reassigned IMPALA-7952: ----------------------------------- Assignee: (was: Paul Rogers) > Planner creates non-normalized binary predicates > ------------------------------------------------ > > Key: IMPALA-7952 > URL: https://issues.apache.org/jira/browse/IMPALA-7952 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 3.1.0 > Reporter: Paul Rogers > Priority: Minor > > The FE has a "normalize binary predicates" rule that puts slots on the left > hand side: > {noformat} > 1 = id --> id = 1 > {noformat} > Presumably this is useful. As the planner proceeds, it creates additional > binary predicates, but tends to create them in the non-normalized form. > Examples: > * {{Expr.trySubstitute()}} > * {{StmtRewriter.createJoinConjunct()}} > * {{SingleNodePlanner.getNormalizedEqPred()}} > * {{StmtRewriter.rewriteWhereClauseSubqueries()}} > * {{HashjoinNode.init()}} > Once rewrite rules are integrated into analysis, we end up with a conflict: > should expressions created internally be exempt from some or all of the > rewrite rules? Even from mandatory rules, such as this one? > The solution is to allow such expressions to be rewritten to normalized form > as part of the new integrate analyze-and-rewrite logic. > Note that the {{trySubstitute()}} case needs more attention. Presumably the > expressions put into the "smap" are analyzed, hence rewritten. If not, then > there are probably other subtle bugs lurking in that code. > Fixing this bug caused plans to change in {{PlannerTest.testJoins()}}. These > changes suggest that one part of the analyzer works to create the "<slot> > <op> <expr>" pattern, while other parts strive for the opposite, creating > instability. Requires more research. > {code:sql} > # test that on-clause predicates referring to multiple tuple ids > # get registered as eq join conjuncts > select t1.* > from (select * from functional.alltypestiny) t1 > join (select * from functional.alltypestiny) t2 on (t1.id = t2.id) > join functional.alltypestiny t3 on (coalesce(t1.id, t2.id) = t3.id) > {code} > Plan before the fix: > {noformat} > PLAN-ROOT SINK > | > 04:HASH JOIN [INNER JOIN] > | hash predicates: coalesce(functional.alltypestiny.id, > functional.alltypestiny.id) = t3.id > | runtime filters: RF000 <- t3.id > | > |--02:SCAN HDFS [functional.alltypestiny t3] > | partitions=4/4 files=4 size=460B > | > 03:HASH JOIN [INNER JOIN] > | hash predicates: functional.alltypestiny.id = functional.alltypestiny.id > | runtime filters: RF002 <- functional.alltypestiny.id > | > |--01:SCAN HDFS [functional.alltypestiny] > | partitions=4/4 files=4 size=460B > | runtime filters: RF000 -> coalesce(functional.alltypestiny.id, > functional.alltypestiny.id) > | > 00:SCAN HDFS [functional.alltypestiny] > partitions=4/4 files=4 size=460B > runtime filters: RF000 -> coalesce(functional.alltypestiny.id, > functional.alltypestiny.id), RF002 -> functional.alltypestiny.id > {noformat} > Plan after the fix, with the filter pushed further down the plan: > {noformat} > PLAN-ROOT SINK > | > 04:HASH JOIN [INNER JOIN] > | hash predicates: t3.id = coalesce(functional.alltypestiny.id, > functional.alltypestiny.id) > | > |--02:SCAN HDFS [functional.alltypestiny t3] > | partitions=4/4 files=4 size=460B > | > 03:HASH JOIN [INNER JOIN] > | hash predicates: functional.alltypestiny.id = functional.alltypestiny.id > | runtime filters: RF002 <- functional.alltypestiny.id > | > |--01:SCAN HDFS [functional.alltypestiny] > | partitions=4/4 files=4 size=460B > | > 00:SCAN HDFS [functional.alltypestiny] > partitions=4/4 files=4 size=460B > runtime filters: RF002 -> functional.alltypestiny.id > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org