[ https://issues.apache.org/jira/browse/IMPALA-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Rogers reassigned IMPALA-8038: ----------------------------------- Assignee: (was: Paul Rogers) > Incorrect estimation of selectivity of OR expression > ---------------------------------------------------- > > Key: IMPALA-8038 > URL: https://issues.apache.org/jira/browse/IMPALA-8038 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 3.1.0 > Reporter: Paul Rogers > Priority: Major > > Suppose we have a query in which we include an OR expression: > {code:sql} > select * > from functional.alltypestiny t > where t.id = 10 > or t.bool_col = true > {code} > We can compute the join cardinality, {{|join|}} by observing that: > * Either id will match, > * Or, month will match, > * Or, both will match. > The number of matches is given as {{NDV(col)/|T|}}, and: > * {{|T| = 8}} > * {{NDV(id) = |T| = 8}} > * {{NDV(bool_col) = 2}} > * {{|T 𝜎 id=10| = |T|/NDV(id) = 1}} > * {{|T 𝜎 bool_col=true| = |T|/NDV(bool_col) = 4}} > So, if one or the other (or both) columns match, this means that we get > either 1 row or 4 rows. So, the estimated cardinality should be the maximum > of the two terms: > {noformat} > |bool_col| = NDV(bool_col) > |id| = NDV(id) > |join| = max( |T|/|id|, |T|/|bool_col| ) > = max(1, 4) > = 4 > {noformat} > Let's see what we actually get: > {noformat} > PLAN-ROOT SINK > | mem-estimate=0B mem-reservation=0B thread-reservation=0 > | > 00:SCAN HDFS [functional.alltypestiny t] > partitions=4/4 files=4 size=460B > predicates: t.id = CAST(10 AS INT) OR t.bool_col = TRUE > tuple-ids=0 row-size=89B cardinality=5 > {noformat} > So, looks like the code uses {{|id| + |bool_col|}}, not the correct {{max( > |id|, |bool_col| )}}. > While the error here is small, it can be larger in other cases. Bad errors > lead to bad join estimation, leading to bad joins, leading to poor > performance, leading to unhappy users. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org