[ https://issues.apache.org/jira/browse/SPARK-33861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599868#comment-17599868 ]
Yuming Wang edited comment on SPARK-33861 at 9/3/22 8:54 AM: ------------------------------------------------------------- Note that only 3.2.0, 3.2.1, 3.2.2 and 3.3.0 include this optimization. We recovered it via [https://github.com/apache/spark/commit/43cbdc6ec9dbcf9ebe0b48e14852cec4af18b4ec] was (Author: q79969786): Note that only 3.2.0, 3.2.1 and 3.3.0 include this optimization. We recovered it via https://github.com/apache/spark/commit/43cbdc6ec9dbcf9ebe0b48e14852cec4af18b4ec > Simplify conditional in predicate > --------------------------------- > > Key: SPARK-33861 > URL: https://issues.apache.org/jira/browse/SPARK-33861 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.2.0 > Reporter: Yuming Wang > Priority: Major > > The use case is: > {noformat} > spark.sql("create table t1 using parquet as select id as a, id as b from > range(10)") > spark.sql("select * from t1 where CASE WHEN a > 2 THEN b + 10 END > > 5").explain() > {noformat} > Before this pr: > {noformat} > == Physical Plan == > *(1) Filter CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END > +- *(1) ColumnarToRow > +- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: > [CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF..., > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct<a:bigint,b:bigint> > {noformat} > After this pr: > {noformat} > == Physical Plan == > *(1) Filter (((isnotnull(a#3L) AND isnotnull(b#4L)) AND (a#3L > 2)) AND > ((b#4L + 10) > 5)) > +- *(1) ColumnarToRow > +- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: > [isnotnull(a#3L), isnotnull(b#4L), (a#3L > 2), ((b#4L + 10) > 5)], Format: > Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF..., > PartitionFilters: [], PushedFilters: [IsNotNull(a), IsNotNull(b), > GreaterThan(a,2)], ReadSchema: struct<a:bigint,b:bigint> > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org