Hello, I am trying to test Spark's SQL window functions in the following blog,  
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html,
 and facing a problem as follows:# testing rowsBetween()winSpec2 = 
window.Window.partitionBy(data["category"]).orderBy(data["revenue"]).rowsBetween(2,2)tmp4
 = 
functions.max(data["revenue"]).over(winSpec2)data.select(["product","category","revenue",
 
tmp4.alias("rowbetween2and2")]).orderBy(["category","revenue"]).show()+----------+----------+-------+---------------+
                                    product  category   revenue    
rowbetween2and2+----------+----------+-------+---------------+  BendableCell 
phone   3000           5000  FoldableCell   phone   3000           6000Ultra 
thinCell   phone   5000           6000      ThinCell    phone   6000           
null --> ??? Very thinCell phone   6000           null    Normal      Tablet 
  1500           4500       Big         Tablet   2500           5500       Pro  
       Tablet   4500           6500      Mini         Tablet   5500           
null      Pro2         Tablet   6500           
null+----------+----------+-------+---------------+As you can see, the last 
column calculates the max value among the current row,left two rows and right 
two rows partitioned by category row.However, the result for the last two rows 
in each category partition is null.Is there something that I missed or is this 
a bug? 
 
 Han-Cheol Cho  Data Laboratory   / Data Scientist  〒160-0022 東京都新宿区新宿6-27-30 
新宿イーストサイドスクエア13階
Email  hancheol....@nhn-techorus.com 


 

Reply via email to