Attila Zsolt Piros created SPARK-22806: ------------------------------------------
Summary: Window Aggregate functions: unexpected result at ordered partition Key: SPARK-22806 URL: https://issues.apache.org/jira/browse/SPARK-22806 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Attila Zsolt Piros I got different results for the aggregate function (even for sum and count) when the partition is ordered "Window.partitionBy(column).orderBy(column))" and when it is not ordered 'Window.partitionBy("column")". Example: test("count, sum, stddev_pop functions over window") { val df = Seq( ("a", 1, 100.0), ("b", 1, 200.0)).toDF("key", "partition", "value") df.createOrReplaceTempView("window_table") checkAnswer( df.select( $"key", count("value").over(Window.partitionBy("partition")), sum("value").over(Window.partitionBy("partition")), stddev_pop("value").over(Window.partitionBy("partition")) ), Seq( Row("a", 2, 300.0, 50.0), Row("b", 2, 300.0, 50.0))) } test("count, sum, stddev_pop functions over ordered by window") { val df = Seq( ("a", 1, 100.0), ("b", 1, 200.0)).toDF("key", "partition", "value") df.createOrReplaceTempView("window_table") checkAnswer( df.select( $"key", count("value").over(Window.partitionBy("partition").orderBy("key")), sum("value").over(Window.partitionBy("partition").orderBy("key")), stddev_pop("value").over(Window.partitionBy("partition").orderBy("key")) ), Seq( Row("a", 2, 300.0, 50.0), Row("b", 2, 300.0, 50.0))) } The "count, sum, stddev_pop functions over ordered by window" fails with the error: == Results == !== Correct Answer - 2 == == Spark Answer - 2 == !struct<> struct<key:string,count(value) OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):bigint,sum(value) OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):double,stddev_pop(value) OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):double> ![a,2,300.0,50.0] [a,1,100.0,0.0] [b,2,300.0,50.0] [b,2,300.0,50.0] -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org