Aman Sinha created IMPALA-10697: ----------------------------------- Summary: NDV for rank() expression is incorrect Key: IMPALA-10697 URL: https://issues.apache.org/jira/browse/IMPALA-10697 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Aman Sinha
In the following query the cardinality of the final Aggregate is always 1 regardless of the cardinality of its child. This is because the NDV of the analytic expr such as RANK seems to always be computed as 1 which is incorrect. {noformat} Query: explain select rnk, count(*) from ( select * from (SELECT rank() OVER (ORDER BY ss_net_profit ASC) rnk FROM store_sales ss1 WHERE ss_store_sk = 4) v1 where rnk < 1000) v2 group by rnk +------------------------------------------------------------------------------------------+ | Explain String | +------------------------------------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=13.94MB Threads=3 | | Per-Host Resource Estimates: Memory=142MB | | Analyzed query: SELECT rnk, count(*) FROM (SELECT * FROM (SELECT rank() OVER | | (ORDER BY ss_net_profit ASC) rnk FROM tpcds.store_sales ss1 WHERE ss_store_sk = | | CAST(4 AS INT)) v1 WHERE rnk < CAST(1000 AS BIGINT)) v2 GROUP BY rnk | | | | F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | | | Per-Host Resources: mem-estimate=14.01MB mem-reservation=5.94MB thread-reservation=1 | | PLAN-ROOT SINK | | | output exprs: rnk, count(*) | | | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0 | | | | | 04:AGGREGATE [FINALIZE] | | | output: count(*) | | | group by: rank() | | | mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0 | | | tuple-ids=5 row-size=16B cardinality=1 | | | in pipelines: 04(GETNEXT), 06(OPEN) | | | | | 03:SELECT | | | predicates: rank() < CAST(1000 AS BIGINT) | | | mem-estimate=0B mem-reservation=0B thread-reservation=0 | | | tuple-ids=8,7 row-size=16B cardinality=999 | | | in pipelines: 06(GETNEXT) | | | | | 02:ANALYTIC | | | functions: rank() | | | order by: ss_net_profit ASC | | | window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW | | | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0 | | | tuple-ids=8,7 row-size=16B cardinality=999 | | | in pipelines: 06(GETNEXT) | | | | | 06:TOP-N | | | order by: ss_net_profit ASC | | | limit with ties: 999 | | | mem-estimate=7.80KB mem-reservation=0B thread-reservation=0 | | | tuple-ids=8 row-size=8B cardinality=999 | | | in pipelines: 06(GETNEXT), 01(OPEN) | | | | | 05:EXCHANGE [UNPARTITIONED] | | | mem-estimate=37.72KB mem-reservation=0B thread-reservation=0 | | | tuple-ids=8 row-size=8B cardinality=999 | | | in pipelines: 01(GETNEXT) | | | | | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 | | Per-Host Resources: mem-estimate=128.01MB mem-reservation=8.00MB thread-reservation=2 | | 01:TOP-N | | | order by: ss_net_profit ASC | | | limit with ties: 999 | | | source expr: rank() < CAST(1000 AS BIGINT) | | | mem-estimate=7.80KB mem-reservation=0B thread-reservation=0 | | | tuple-ids=8 row-size=8B cardinality=999 | | | in pipelines: 01(GETNEXT), 00(OPEN) | | | | | 00:SCAN HDFS [tpcds.store_sales ss1, RANDOM] | | HDFS partitions=1824/1824 files=1824 size=346.60MB | | predicates: ss_store_sk = CAST(4 AS INT) | | stored statistics: | | table: rows=2.88M size=346.60MB | | partitions: 1824/1824 rows=2.88M | | columns: all | | extrapolated-rows=disabled max-scan-range-rows=130.09K | | mem-estimate=128.00MB mem-reservation=8.00MB thread-reservation=1 | | tuple-ids=0 row-size=8B cardinality=480.07K | | in pipelines: 00(GETNEXT) | +------------------------------------------------------------------------------------------+ {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org