[ https://issues.apache.org/jira/browse/SPARK-25401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712350#comment-16712350 ]
David Vrba edited comment on SPARK-25401 at 12/7/18 10:59 AM: -------------------------------------------------------------- I was looking at it and i believe that in the class EnsureRequirements we could reorder the join predicates for SortMergeJoin once more - just before we check if child outputOrdering satisfies the requiredOrdering - and we can align the predicate keys with the child outputOrdering. In such case it is not going to add the unnecessary SortExec and also it is not going to add unnecessary Exchange either, because Exchange is handled before. What do you guys think? Is it a good approach? (Please be patient with me, this is my first Jira on Spark) was (Author: vrbad): I was looking at it and i believe that it the class EnsureRequirements we could reorder the join predicates for SortMergeJoin once more - just before we check if child outputOrdering satisfies the requiredOrdering - and we can align the predicate keys with the child outputOrdering. In such case it is not going to add the unnecessary SortExec and also it is not going to add unnecessary Exchange either, because Exchange is handled before. What do you guys think? Is it a good approach? (Please be patient with me, this is my first Jira on Spark) > Reorder the required ordering to match the table's output ordering for bucket > join > ---------------------------------------------------------------------------------- > > Key: SPARK-25401 > URL: https://issues.apache.org/jira/browse/SPARK-25401 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.3.0 > Reporter: Wang, Gang > Priority: Major > > Currently, we check if SortExec is needed between a operator and its child > operator in method orderingSatisfies, and method orderingSatisfies require > the order in the SortOrders are all the same. > While, take the following case into consideration. > * Table a is bucketed by (a1, a2), sorted by (a2, a1), and buckets number is > 200. > * Table b is bucketed by (b1, b2), sorted by (b2, b1), and buckets number is > 200. > * Table a join table b on (a1=b1, a2=b2) > In this case, if the join is sort merge join, the query planner won't add > exchange on both sides, while, sort will be added on both sides. Actually, > sort is also unnecessary, since in the same bucket, like bucket 1 of table a, > and bucket 1 of table b, (a1=b1, a2=b2) is equivalent to (a2=b2, a1=b1). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org