Check out: https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
On Thu, Oct 15, 2015 at 11:35 AM, Deenar Toraskar <deenar.toras...@gmail.com > wrote: > you can do a self join of the table with itself with the join clause being > a.col1 >= b.col1 > > select a.col1, a.col2, sum(b.col2) > from tablea as a left outer join tablea as b on (a.col1 >= b.col1) > group by a.col1, a.col2 > > I havent tried it, but cant see why it cant work, but doing it in RDD > might be more efficient see > https://bzhangusc.wordpress.com/2014/06/21/calculate-running-sums/ > > On 15 October 2015 at 18:48, Stefan Panayotov <spanayo...@msn.com> wrote: > >> Hi, >> >> I need help with Spark SQL. I need to achieve something like the >> following. >> If I have data like: >> >> col_1 col_2 >> 1 10 >> 2 30 >> 3 15 >> 4 20 >> 5 25 >> >> I need to get col_3 to be the running total of the sum of the previous >> rows of col_2, e.g. >> >> col_1 col_2 col_3 >> 1 10 10 >> 2 30 40 >> 3 15 55 >> 4 20 75 >> 5 25 100 >> >> Is there a way to achieve this in Spark SQL or maybe with Data frame >> transformations? >> >> Thanks in advance, >> >> >> *Stefan Panayotov, PhD **Home*: 610-355-0919 >> *Cell*: 610-517-5586 >> *email*: spanayo...@msn.com >> spanayo...@outlook.com >> spanayo...@comcast.net >> >> > >