hvanhovell opened a new pull request #25174: [SPARK-27485][BRANCH-2.4] 
EnsureRequirements.reorder should handle duplicate expressions gracefully
URL: https://github.com/apache/spark/pull/25174
 
 
   Backport of 421d9d56efd447d31787e77316ce0eafb5fe45a5
   
   ## What changes were proposed in this pull request?
   When reordering joins EnsureRequirements only checks if all the join keys 
are present in the partitioning expression seq. This is problematic when the 
joins keys and and partitioning expressions both contain duplicates but not the 
same number of duplicates for each expression, e.g. `Seq(a, a, b)` vs `Seq(a, 
b, b)`. This fails with an index lookup failure in the `reorder` function.
   
   This PR fixes this removing the equality checking logic from the 
`reorderJoinKeys` function, and by doing the multiset equality in the `reorder` 
function while building the reordered key sequences.
   
   ## How was this patch tested?
   Added a unit test to the `PlannerSuite` and added an integration test to 
`JoinSuite`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to