Hello, I am trying to test Spark's SQL window functions in the following blog,
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html,
and facing a problem as follows:# testing rowsBetween()winSpec2 =
window.Window.partitionBy(data["category"]).orderBy(data["revenue"]).rowsBetween(2,2)tmp4
=
functions.max(data["revenue"]).over(winSpec2)data.select(["product","category","revenue",
tmp4.alias("rowbetween2and2")]).orderBy(["category","revenue"]).show()+----------+----------+-------+---------------+
product category revenue
rowbetween2and2+----------+----------+-------+---------------+ BendableCell
phone 3000 5000 FoldableCell phone 3000 6000Ultra
thinCell phone 5000 6000 ThinCell phone 6000
null --> ??? Very thinCell phone 6000 null Normal Tablet
1500 4500 Big Tablet 2500 5500 Pro
Tablet 4500 6500 Mini Tablet 5500
null Pro2 Tablet 6500
null+----------+----------+-------+---------------+As you can see, the last
column calculates the max value among the current row,left two rows and right
two rows partitioned by category row.However, the result for the last two rows
in each category partition is null.Is there something that I missed or is this
a bug?
Han-Cheol Cho Data Laboratory / Data Scientist 〒160-0022 東京都新宿区新宿6-27-30
新宿イーストサイドスクエア13階
Email hancheol....@nhn-techorus.com