Chao Sun created SPARK-35703:
--------------------------------

             Summary: Remove HashClusteredDistribution
                 Key: SPARK-35703
                 URL: https://issues.apache.org/jira/browse/SPARK-35703
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.2.0
            Reporter: Chao Sun


Currently Spark has {{HashClusteredDistribution}} and 
{{ClusteredDistribution}}. The only difference between the two is that the 
former is more strict when deciding whether bucket join is allowed to avoid 
shuffle: comparing to the latter, it requires *exact* match between the 
clustering keys from the output partitioning and the join keys. However, this 
is unnecessary, as we should be able to avoid shuffle when the set of 
clustering keys is a subset of join keys, just like {{ClusteredDistribution}}. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to