[ 
https://issues.apache.org/jira/browse/SPARK-34807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-34807.
---------------------------------
    Fix Version/s: 3.2.0
       Resolution: Fixed

Issue resolved by pull request 31980
[https://github.com/apache/spark/pull/31980]

> Push down filter through window after TransposeWindow
> -----------------------------------------------------
>
>                 Key: SPARK-34807
>                 URL: https://issues.apache.org/jira/browse/SPARK-34807
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Yuming Wang
>            Assignee: Tanel Kiis
>            Priority: Major
>             Fix For: 3.2.0
>
>
> {code:scala}
>       spark.range(10).selectExpr("id AS a", "id AS b", "id AS c", "id AS 
> d").createTempView("t1")
>       val df = spark.sql(
>         """
>           |SELECT *
>           |  FROM (
>           |    SELECT b,
>           |      sum(d) OVER (PARTITION BY a, b),
>           |      rank() OVER (PARTITION BY a ORDER BY c)
>           |    FROM t1
>           |  ) v1
>           |WHERE b = 2
>           |""".stripMargin)
> {code}
> Current optimized plan:
> {noformat}
> == Optimized Logical Plan ==
> Project [b#221L, sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED 
> PRECEDING AND UNBOUNDED FOLLOWING)#231L, RANK() OVER (PARTITION BY a ORDER BY 
> c ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232]
> +- Filter (b#221L = 2)
>    +- Window [rank(c#222L) windowspecdefinition(a#220L, c#222L ASC NULLS 
> FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) 
> AS RANK() OVER (PARTITION BY a ORDER BY c ASC NULLS FIRST ROWS BETWEEN 
> UNBOUNDED PRECEDING AND CURRENT ROW)#232], [a#220L], [c#222L ASC NULLS FIRST]
>       +- Project [b#221L, a#220L, c#222L, sum(d) OVER (PARTITION BY a, b ROWS 
> BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#231L]
>          +- Window [sum(d#223L) windowspecdefinition(a#220L, b#221L, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
> AS sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED PRECEDING AND 
> UNBOUNDED FOLLOWING)#231L], [a#220L, b#221L]
>             +- Project [id#218L AS b#221L, id#218L AS d#223L, id#218L AS 
> a#220L, id#218L AS c#222L]
>                +- Range (0, 10, step=1, splits=Some(2))
> {noformat}
> Expected optimized plan:
> {noformat}
> == Optimized Logical Plan ==
> Project [b#221L, sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED 
> PRECEDING AND UNBOUNDED FOLLOWING)#231L, RANK() OVER (PARTITION BY a ORDER BY 
> c ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232]
> +- Window [sum(d#223L) windowspecdefinition(a#220L, b#221L, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
> AS sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED PRECEDING AND 
> UNBOUNDED FOLLOWING)#231L], [a#220L, b#221L]
>    +- Project [b#221L, d#223L, a#220L, RANK() OVER (PARTITION BY a ORDER BY c 
> ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232]
>       +- Filter (b#221L = 2)
>          +- Window [rank(c#222L) windowspecdefinition(a#220L, c#222L ASC 
> NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), 
> currentrow$())) AS RANK() OVER (PARTITION BY a ORDER BY c ASC NULLS FIRST 
> ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232], [a#220L], [c#222L ASC 
> NULLS FIRST]
>             +- Project [id#218L AS b#221L, id#218L AS d#223L, id#218L AS 
> a#220L, id#218L AS c#222L]
>                +- Range (0, 10, step=1, splits=Some(2))
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to