Chao Sun created SPARK-35703: -------------------------------- Summary: Remove HashClusteredDistribution Key: SPARK-35703 URL: https://issues.apache.org/jira/browse/SPARK-35703 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Chao Sun
Currently Spark has {{HashClusteredDistribution}} and {{ClusteredDistribution}}. The only difference between the two is that the former is more strict when deciding whether bucket join is allowed to avoid shuffle: comparing to the latter, it requires *exact* match between the clustering keys from the output partitioning and the join keys. However, this is unnecessary, as we should be able to avoid shuffle when the set of clustering keys is a subset of join keys, just like {{ClusteredDistribution}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org