[jira] [Comment Edited] (SPARK-25401) Reorder the required ordering to match the table's output ordering for bucket join

David Vrba (JIRA) Fri, 07 Dec 2018 03:09:02 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-25401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712350#comment-16712350
 ]


David Vrba edited comment on SPARK-25401 at 12/7/18 10:59 AM:
--------------------------------------------------------------

I was looking at it and i believe that in the class EnsureRequirements we could 
reorder the join predicates for SortMergeJoin once more - just before we check 
if child outputOrdering satisfies the requiredOrdering - and we can align the 
predicate keys with the child outputOrdering. In such case it is not going to 
add the unnecessary SortExec and also it is not going to add unnecessary 
Exchange either, because Exchange is handled before.

 

What do you guys think? Is it a good approach? (Please be patient with me, this 
is my first Jira on Spark)


was (Author: vrbad):
I was looking at it and i believe that it the class EnsureRequirements we could 
reorder the join predicates for SortMergeJoin once more - just before we check 
if child outputOrdering satisfies the requiredOrdering - and we can align the 
predicate keys with the child outputOrdering. In such case it is not going to 
add the unnecessary SortExec and also it is not going to add unnecessary 
Exchange either, because Exchange is handled before.

 

What do you guys think? Is it a good approach? (Please be patient with me, this 
is my first Jira on Spark)

> Reorder the required ordering to match the table's output ordering for bucket 
> join
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-25401
>                 URL: https://issues.apache.org/jira/browse/SPARK-25401
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Wang, Gang
>            Priority: Major
>
> Currently, we check if SortExec is needed between a operator and its child 
> operator in method orderingSatisfies, and method orderingSatisfies require 
> the order in the SortOrders are all the same.
> While, take the following case into consideration.
>  * Table a is bucketed by (a1, a2), sorted by (a2, a1), and buckets number is 
> 200.
>  * Table b is bucketed by (b1, b2), sorted by (b2, b1), and buckets number is 
> 200.
>  * Table a join table b on (a1=b1, a2=b2)
> In this case, if the join is sort merge join, the query planner won't add 
> exchange on both sides, while, sort will be added on both sides. Actually, 
> sort is also unnecessary, since in the same bucket, like bucket 1 of table a, 
> and bucket 1 of table b, (a1=b1, a2=b2) is equivalent to (a2=b2, a1=b1).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-25401) Reorder the required ordering to match the table's output ordering for bucket join

Reply via email to