Aman Sinha created IMPALA-12006: ----------------------------------- Summary: Left outer join cardinality highly overestimated Key: IMPALA-12006 URL: https://issues.apache.org/jira/browse/IMPALA-12006 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 4.2.0 Reporter: Aman Sinha
In one of the use cases, we have seen the cardinality estimate for left outer join highly overestimated. The plan is complex and only a partial output is shown below (with the column names anonymized): {noformat} 57:HASH JOIN [LEFT OUTER JOIN, BROADCAST] | hash-table-id=121 | hash predicates: a.id = a.id | fk/pk conjuncts: none | mem-estimate=0B mem-reservation=0B spill-buffer=2.00MB thread-reservation=0 | tuple-ids=4N,3,6N,8N,9N,11N,14N,16N,19N,21N,24N,26N,29N,31N row-size=2.63KB cardinality=3.90T | in pipelines: 06(GETNEXT), 26(OPEN) | |--F1253:PLAN FRAGMENT hosts=13 instances=13 | | Per-Instance Resources: mem-estimate=1.10GB mem-reservation=204.00MB thread-reservation=1 | JOIN BUILD | | join-table-id=121 plan-id=122 cohort-id=25 | | build expressions: a.id | | mem-estimate=1.08GB mem-reservation=204.00MB spill-buffer=2.00MB thread-reservation=0 | | | 1758:EXCHANGE [BROADCAST] | | mem-estimate=20.87MB mem-reservation=0B thread-reservation=0 | | tuple-ids=29,31 row-size=85B cardinality=9.56M | | in pipelines: 26(GETNEXT) | | ... ... ... 56:HASH JOIN [LEFT OUTER JOIN, BROADCAST] | hash predicates: ifnull(a.id, a.id) = a.id | fk/pk conjuncts: assumed fk/pk | mem-estimate=0B mem-reservation=0B spill-buffer=2.00MB thread-reservation=0 | tuple-ids=4N,3,6N,8N,9N,11N,14N,16N,19N,21N,24N,26N row-size=2.55KB cardinality=14.97G | in pipelines: 06(GETNEXT), 22(OPEN) {noformat} Note that the left input of the join is estimated as 14.97G rows, right input as 9.56M rows but the LOJ estimate is 3.9T rows. We need to investigate why that is so and fix it. The NDV of the based column involved in the join is 36661 but in the lower join there are functions involved in the join condition. -- This message was sent by Atlassian Jira (v8.20.10#820010)