[ https://issues.apache.org/jira/browse/SPARK-40002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-40002. ---------------------------------- Fix Version/s: 3.3.1 3.2.3 3.4.0 Resolution: Fixed Issue resolved by pull request 37443 [https://github.com/apache/spark/pull/37443] > Limit improperly pushed down through window using ntile function > ---------------------------------------------------------------- > > Key: SPARK-40002 > URL: https://issues.apache.org/jira/browse/SPARK-40002 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.3.0, 3.2.2 > Reporter: Bruce Robbins > Assignee: Bruce Robbins > Priority: Major > Labels: correctness > Fix For: 3.3.1, 3.2.3, 3.4.0 > > > Limit is pushed down through a window using the ntile function, which causes > results that differ from Hive 2.3.9, and Prestodb 0.268, and older versions > of Spark (e.g., 3.1.3). > Assume this data: > {noformat} > create table t1 stored as parquet as > select * > from range(101); > {noformat} > Also assume this query: > {noformat} > select id, ntile(10) over (order by id) as nt > from t1 > limit 10; > {noformat} > Spark 3.2.2, Spark 3.3.0, and master produce the following: > {noformat} > +---+---+ > |id |nt | > +---+---+ > |0 |1 | > |1 |2 | > |2 |3 | > |3 |4 | > |4 |5 | > |5 |6 | > |6 |7 | > |7 |8 | > |8 |9 | > |9 |10 | > +---+---+ > {noformat} > However, Spark 3.1.3, Hive 2.3.9, and Prestodb 0.268 produce the following: > {noformat} > +---+---+ > |id |nt | > +---+---+ > |0 |1 | > |1 |1 | > |2 |1 | > |3 |1 | > |4 |1 | > |5 |1 | > |6 |1 | > |7 |1 | > |8 |1 | > |9 |1 | > +---+---+ > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org