QiangCai opened a new pull request #3681: [CARBONDATA-3752] Reuse Exchange to 
fix performance issue
URL: https://github.com/apache/carbondata/pull/3681
 
 
    ### Why is this PR needed?
   Spark ReusedExchange rule can't recognition the same Exchange plan on carbon 
table.
   So the query on the carbon table doesn't reuse Exchange, it leads to bad 
performance.
   
   For Example:
   
   ```
   create table t1(c1 int, c2 string) using carbondata
   
   explain
   select c2, sum(c1) from t1 group by c2
   union all
   select c2, sum(c1) from t1 group by c2
   ```
   physical plan as following:
   ```
   Union
   :- *(2) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
   : +- Exchange hashpartitioning(c2#37, 200)
   : +- *(1) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as 
bigint))])
   : +- *(1) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: 
struct<c1:int,c2:string>
   +- *(4) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
    +- Exchange hashpartitioning(c2#37, 200)
    +- *(3) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as 
bigint))])
    +- *(3) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: 
struct<c1:int,c2:string>
   ```
   
   after change, physical plan as following:
   
   ```
   Union
   :- *(2) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
   :  +- Exchange hashpartitioning(c2#37, 200)
   :     +- *(1) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 
as bigint))])
   :        +- *(1) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: 
struct<c1:int,c2:string>
   +- *(4) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
      +- ReusedExchange [c2#37, sum#54L], Exchange hashpartitioning(c2#37, 200)
   ```
   
   
    ### What changes were proposed in this PR?
   change CarbonFileIndex class to case class.
   
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - Yes
   
       
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to