[ https://issues.apache.org/jira/browse/SPARK-39893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wan Kun updated SPARK-39893: ---------------------------- Summary: Remove redundant aggregate if it is group only and all grouping and aggregate expressions are foldable (was: Remove Aggregate if it is group only and all grouping and aggregate expressions are foldable) > Remove redundant aggregate if it is group only and all grouping and aggregate > expressions are foldable > ------------------------------------------------------------------------------------------------------ > > Key: SPARK-39893 > URL: https://issues.apache.org/jira/browse/SPARK-39893 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.4.0 > Reporter: Wan Kun > Priority: Major > > If all groupingExpressions and aggregateExpressions in a aggregate are > foldable, we can remove this aggregate. > For example, query : > {code:java} > SELECT distinct 1001 as id , cast('2022-06-03' as date) AS DT FROM testData > {code} > the grouping expressions are : *[1001, 2022-06-03]* > the aggregate expressions are : *[1001 AS id#274, 2022-06-03 AS DT#275]* > so we can skip scan table testData and remote the aggregate operation. > Before this PR: > {code:java} > Aggregate [1001, 2022-06-03], [1001 AS id#274, 2022-06-03 AS DT#275], > Statistics(sizeInBytes=16.0 EiB) > +- SerializeFromObject, Statistics(sizeInBytes=8.0 EiB) > +- ExternalRDD [obj#12], Statistics(sizeInBytes=8.0 EiB) > {code} > After this PR: > {code:java} > Project [1001 AS id#218, 2022-06-03 AS DT#219], Statistics(sizeInBytes=2.0 B) > +- OneRowRelation, Statistics(sizeInBytes=1.0 B) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org