Chao Sun created HIVE-15796:
-------------------------------

             Summary: HoS: poor reducer parallelism when operator stats are not 
accurate
                 Key: HIVE-15796
                 URL: https://issues.apache.org/jira/browse/HIVE-15796
             Project: Hive
          Issue Type: Improvement
          Components: Statistics
    Affects Versions: 2.2.0
            Reporter: Chao Sun
            Assignee: Chao Sun


In HoS we use currently use operator stats to determine reducer parallelism. 
However, it is often the case that operator stats are not accurate, especially 
if column stats are not available. This sometimes will generate extremely poor 
reducer parallelism, and cause HoS query to run forever. 

This JIRA tries to offer an alternative way to compute reducer parallelism, 
similar to how MR does. Here's the approach we are suggesting:
1. when computing the parallelism for a MapWork, use stats associated with the 
TableScan operator;
2. when computing the parallelism for a ReduceWork, use the *maximum* 
parallelism from all its parents.






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to