Shant Hovsepian created IMPALA-10136:
----------------------------------------

             Summary: Cardinality estimates for aggregation operations don't 
consider conjuncts on grouping expressions correctly
                 Key: IMPALA-10136
                 URL: https://issues.apache.org/jira/browse/IMPALA-10136
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 3.4.0
            Reporter: Shant Hovsepian
            Assignee: Shant Hovsepian


ComputeStats() in the PlanNode calls estimateNumGroups() for the 
AggregationNode to calculate the cardinality of a grouping expression. Then in 
a later step applyConjunctsSelectivity() is called to adjust the cardinality 
based on the available conjuncts. However with aggregation operations certain 
conjuncts i.e. those from the HAVING clause or conjuncts on the grouping 
expressions affect the number of groups produced.

ndv(day) = 11 

count(alltypesagg) = 10280
{code:java}
Query: explain select day, count(*) from alltypesagg where day=2 group by 1
+------------------------------------------------------------+
| Explain String                                             |
+------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=4.06MB Threads=4 |
| Per-Host Resource Estimates: Memory=52MB                   |
| Codegen disabled by planner                                |
|                                                            |
| PLAN-ROOT SINK                                             |
| |                                                          |
| 04:EXCHANGE [UNPARTITIONED]                                |
| |                                                          |
| 03:AGGREGATE [FINALIZE]                                    |
| |  output: count:merge(*)                                  |
| |  group by: `day`                                         |
| |  row-size=12B cardinality=11                             |
| |                                                          |
| 02:EXCHANGE [HASH(`day`)]                                  |
| |                                                          |
| 01:AGGREGATE [STREAMING]                                   |
| |  output: count(*)                                        |
| |  group by: `day`                                         |
| |  row-size=12B cardinality=11                             |
| |                                                          |
| 00:SCAN HDFS [functional.alltypesagg]                      |
|    partition predicates: `day` = 2                         |
|    HDFS partitions=1/11 files=1 size=74.48KB               |
|    row-size=4B cardinality=1.00K                           |
+------------------------------------------------------------+
Fetched 24 row(s) in 0.02s
{code}
 

Given the predicate day=1 applies to the grouping expression the cardinality of 
the aggregation node should b 1 as opposed to 11.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to