[jira] [Comment Edited] (SPARK-33861) Simplify conditional in predicate

Yuming Wang (Jira) Sat, 03 Sep 2022 01:55:08 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-33861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599868#comment-17599868
 ]


Yuming Wang edited comment on SPARK-33861 at 9/3/22 8:54 AM:
-------------------------------------------------------------

Note that only 3.2.0, 3.2.1, 3.2.2 and 3.3.0 include this optimization. We 
recovered it via 
[https://github.com/apache/spark/commit/43cbdc6ec9dbcf9ebe0b48e14852cec4af18b4ec]


was (Author: q79969786):
Note that only 3.2.0, 3.2.1 and 3.3.0 include this optimization. We recovered 
it via 
https://github.com/apache/spark/commit/43cbdc6ec9dbcf9ebe0b48e14852cec4af18b4ec

> Simplify conditional in predicate
> ---------------------------------
>
>                 Key: SPARK-33861
>                 URL: https://issues.apache.org/jira/browse/SPARK-33861
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Yuming Wang
>            Priority: Major
>
> The use case is:
> {noformat}
> spark.sql("create table t1 using parquet as select id as a, id as b from 
> range(10)")
> spark.sql("select * from t1 where CASE WHEN a > 2 THEN b + 10 END > 
> 5").explain()
> {noformat}
> Before this pr:
> {noformat}
> == Physical Plan ==
> *(1) Filter CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END
> +- *(1) ColumnarToRow
>    +- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: 
> [CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct<a:bigint,b:bigint>
> {noformat}
> After this pr:
> {noformat}
> == Physical Plan ==
> *(1) Filter (((isnotnull(a#3L) AND isnotnull(b#4L)) AND (a#3L > 2)) AND 
> ((b#4L + 10) > 5))
> +- *(1) ColumnarToRow
>    +- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: 
> [isnotnull(a#3L), isnotnull(b#4L), (a#3L > 2), ((b#4L + 10) > 5)], Format: 
> Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(a), IsNotNull(b), 
> GreaterThan(a,2)], ReadSchema: struct<a:bigint,b:bigint>
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33861) Simplify conditional in predicate

Reply via email to