Aman Sinha created DRILL-4188:
---------------------------------

             Summary: Change the default value of 
planner.enable_hash_single_key to false
                 Key: DRILL-4188
                 URL: https://issues.apache.org/jira/browse/DRILL-4188
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization
    Affects Versions: 1.4.0
            Reporter: Aman Sinha
            Assignee: Aman Sinha


The planner.enable_hash_single_key flag is used by the HashJoin and MergeJoin 
plans to do hash distribution on both sides of the join when it is a 
multi-column join (e.g T1.a1 = T2.a2 AND T1.b1 = T2.b2).   The default value of 
this parameter is True, which means that Drill will generate multiple plans 
each with hash distribute on only 1 column.  The final plan chosen is based on 
costing.  

However, due to lack of column statistics, this approach is problematic because 
we could end up picking the first column for hash distribution if all plans 
cost the same and if this column has low number of distinct values, there could 
be substantial skew in distribution.  

Doing the hash distribution on all columns should be the default, so I propose 
to change planner.enable_hash_single_key to False.  The scenario where we might 
still want single column hash distribution is when the join is done after some 
other operation (e.g window function, grouped-aggregation) where the child 
already does a hash-distribution on 1 column that is part of the join.  
However, for those case, we may want to selectively enable this flag. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to