Gautam Kumar Parai created DRILL-4771:
-----------------------------------------

             Summary: Drill should avoid doing the same join twice if 
count(distinct) exists
                 Key: DRILL-4771
                 URL: https://issues.apache.org/jira/browse/DRILL-4771
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.6.0
            Reporter: Gautam Kumar Parai
            Assignee: Gautam Kumar Parai


When the query has one distinct aggregate and one or more non-distinct 
aggregates, the join instance need not produce the join-based plan. We can 
generate multi-phase aggregates. Another approach would be to use grouping 
sets. However, Drill is unable to support grouping sets and instead relies on 
the join-based plan (see the plan below)

{code}
select emp.empno, count(*), avg(distinct dept.deptno) 
from sales.emp emp inner join sales.dept dept 
on emp.deptno = dept.deptno 
group by emp.empno

LogicalProject(EMPNO=[$0], EXPR$1=[$1], EXPR$2=[$3])
  LogicalJoin(condition=[IS NOT DISTINCT FROM($0, $2)], joinType=[inner])
    LogicalAggregate(group=[{0}], EXPR$1=[COUNT()])
      LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
        LogicalJoin(condition=[=($7, $9)], joinType=[inner])
          LogicalTableScan(table=[[CATALOG, SALES, EMP]])
          LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
    LogicalAggregate(group=[{0}], EXPR$2=[AVG($1)])
      LogicalAggregate(group=[{0, 1}])
        LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
          LogicalJoin(condition=[=($7, $9)], joinType=[inner])
            LogicalTableScan(table=[[CATALOG, SALES, EMP]])
            LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{code}

The more efficient form should look like this

{code}

select emp.empno, count(*), avg(distinct dept.deptno) 
from sales.emp emp inner join sales.dept dept 
on emp.deptno = dept.deptno 
group by emp.empno

LogicalAggregate(group=[{0}], EXPR$1=[SUM($2)], EXPR$2=[AVG($1)])
  LogicalAggregate(group=[{0, 1}], EXPR$1=[COUNT()])
    LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
      LogicalJoin(condition=[=($7, $9)], joinType=[inner])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
        LogicalTableScan(table=[[CATALOG, SALES, DEPT]])

{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to