Hi All, I would like some clarification regarding window functions for Apache Spark 1.4.0
- https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html In particular, the "rowsBetween" * {{{ * val w = Window.partitionBy("name").orderBy("id") * df.select( * sum("price").over(w.rangeBetween(Long.MinValue, 2)), * avg("price").over(w.rowsBetween(0, 4)) * ) * }}} Are any of the window functions available without a hive context? If the answer is no, then is there any other way to accomplish this without using hive? I need to compare the the i[th] row with the [i-1]th row of col2 (sorted by col1). If item_i of the i[th] row and the item_[i-1] of the [i-1]th row are different then I need to increment the count of item_[i-1] by 1. col1| col2 ---------------------------------------------------------- 1 | item_1 2 | item_1 3 | item_2 4 | item_1 5 | item_2 6 | item_1 In the above example, if we scan two rows at a time downwards, we see that row 2 and row 3 are different therefore we add one to item_1. Next, we see that row 3 is different from row 4, then add one to item_2. Continue until we end up with: col2 | col3 ------------------------------- item_1 | 2 item_2 | 2 Thanks, Mike.