XiDuo You created SPARK-37502: --------------------------------- Summary: Support cast aware output partitioning and required if it can up cast Key: SPARK-37502 URL: https://issues.apache.org/jira/browse/SPARK-37502 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: XiDuo You
if a `Cast` is up cast then it should be without any truncating or precision lose or possible runtime failures. So the output partitioning should be same with/without `Cast` if the `Cast` is up cast. Let's say we have a query: {code:java} -- v1: c1 int -- v2: c2 long SELECT * FROM v2 JOIN (SELECT c1, count(*) FROM v1 GROUP BY c1) v1 ON v1.c1 = v2.c2 {code} The executed plan contains three shuffle nodes which looks like: {code:java} SortMergeJoin Exchange(cast(c1 as bigint)) HashAggregate Exchange(c1) Scan v1 Exchange(c2) Scan v2 {code} We can simply the plan using two shuffle nodes: {code:java} SortMergeJoin HashAggregate Exchange(c1) Scan v1 Exchange(c2) Scan v2 {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org