honghui.Liu created CALCITE-5158:
------------------------------------

             Summary: count(1) with subquery count(distinct) gives wrong 
results with hive.optimize.distinct.rewrite=true and cbo on
                 Key: CALCITE-5158
                 URL: https://issues.apache.org/jira/browse/CALCITE-5158
             Project: Calcite
          Issue Type: Bug
    Affects Versions: 1.19.0
            Reporter: honghui.Liu


{code:java}
create table count_distinct(a int, b int);
insert into table count_distinct values (1,2),(2,3);
set hive.execution.engine=tez;
set hive.cbo.enable=true;
set hive.optimize.distinct.rewrite=true;
select count(1) from ( 
      select count(distinct a) from count_distinct
) tmp; {code}
it give wrong result when hive.optimize.distinct.rewrite is true, By default, 
it's true for all 3.x versions. The test result is 2, and the expected result 
is 1.

Before CBO optimization,RelNode tree as this,
{code:java}
HiveProject(_o__c0=[$0])
  HiveAggregate(group=[{}], agg#0=[count($0)])
    HiveProject($f0=[1])
      HiveProject(_o__c0=[$0])
        HiveAggregate(group=[{}], agg#0=[count(DISTINCT $0)])
          HiveProject($f0=[$0])
            HiveTableScan(table=[[default.count_distinct]], 
table:alias=[count_distinct]) {code}
Optimized by HiveExpandDistinctAggregatesRule, RelNode tree as this,
{code:java}
HiveProject(_o__c0=[$0])
  HiveAggregate(group=[{}], agg#0=[count($0)])
    HiveProject($f0=[1])
      HiveProject(_o__c0=[$0])
        HiveAggregate(group=[{}], agg#0=[count($0)])
          HiveAggregate(group=[{0}])
            HiveProject($f0=[$0])
              HiveProject($f0=[$0])
                HiveTableScan(table=[[default.count_distinct]], 
table:alias=[count_distinct]) {code}
count(distinct xx) converte to count (xx) from (select xx from table_name group 
by xx) 

Optimized by Projection Pruning, RelNode tree as this, 
{code:java}
HiveAggregate(group=[{}], agg#0=[count()])
  HiveProject(DUMMY=[0])
    HiveAggregate(group=[{}])
      HiveAggregate(group=[{0}])
        HiveProject(a=[$0])
          HiveTableScan(table=[[default.count_distinct]], 
table:alias=[count_distinct]) {code}
In this case, an error occurs in the execution plan.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to