[ https://issues.apache.org/jira/browse/SPARK-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lior Chaga resolved SPARK-22840. -------------------------------- Resolution: Not A Bug > Incorrect results when using distinct on window > ----------------------------------------------- > > Key: SPARK-22840 > URL: https://issues.apache.org/jira/browse/SPARK-22840 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: Lior Chaga > Attachments: sample.parquet.zip > > > Given the following schema: > {code} > root > |-- id: string (nullable = true) > |-- start_time: long (nullable = true) > |-- stats: array (nullable = true) > | |-- element: struct (containsNull = true) > | | |-- calibratedRecsHistory: double (nullable = true) > | | |-- eventTime: long (nullable = true) > | | |-- itemId: long (nullable = true) > | | |-- recsHistory: long (nullable = true) > {code} > Data contains multiple rows per id and start_time, with all stats elements > for a specific id and start_time is identical in all rows, I've noticed > inconsistent results when using window with FIRST(stats) DESC, and > LAST(stats) ASC. > Specifically, the latter (LAST with ASC) produces more results. > This is the query for seeing that: > {code} > SELECT DISTINCT > id , > LAST(stats) over w > FROM sample > WINDOW w AS (PARTITION BY id SORT BY start_time DESC) > except > SELECT DISTINCT > id , > FIRST(stats) over w > FROM sample > WINDOW w AS (PARTITION BY id SORT BY start_time ASC) > {code} > Each of the subqueries should return the stats for the latest start_time, > partitioned by id. > Changing the order of the subqueries returns nothing... > The query with FIRST and ASC produces correct results. > the data for sample is attached in [^sample.parquet.zip] -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org