[ https://issues.apache.org/jira/browse/SPARK-41471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652484#comment-17652484 ]
Mars commented on SPARK-41471: ------------------------------ [~csun] Hi, I want to take it :) > SPJ: Reduce Spark shuffle when only one side of a join is > KeyGroupedPartitioning > -------------------------------------------------------------------------------- > > Key: SPARK-41471 > URL: https://issues.apache.org/jira/browse/SPARK-41471 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.3.1 > Reporter: Chao Sun > Priority: Major > > When only one side of a SPJ (Storage-Partitioned Join) is > {{{}KeyGroupedPartitioning{}}}, Spark currently needs to shuffle both sides > using {{{}HashPartitioning{}}}. However, we may just need to shuffle the > other side according to the partition transforms defined in > {{{}KeyGroupedPartitioning{}}}. This is especially useful when the other side > is relatively small. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org