you can do a self join of the table with itself with the join clause being a.col1 >= b.col1
select a.col1, a.col2, sum(b.col2) from tablea as a left outer join tablea as b on (a.col1 >= b.col1) group by a.col1, a.col2 I havent tried it, but cant see why it cant work, but doing it in RDD might be more efficient see https://bzhangusc.wordpress.com/2014/06/21/calculate-running-sums/ On 15 October 2015 at 18:48, Stefan Panayotov <spanayo...@msn.com> wrote: > Hi, > > I need help with Spark SQL. I need to achieve something like the following. > If I have data like: > > col_1 col_2 > 1 10 > 2 30 > 3 15 > 4 20 > 5 25 > > I need to get col_3 to be the running total of the sum of the previous > rows of col_2, e.g. > > col_1 col_2 col_3 > 1 10 10 > 2 30 40 > 3 15 55 > 4 20 75 > 5 25 100 > > Is there a way to achieve this in Spark SQL or maybe with Data frame > transformations? > > Thanks in advance, > > > *Stefan Panayotov, PhD **Home*: 610-355-0919 > *Cell*: 610-517-5586 > *email*: spanayo...@msn.com > spanayo...@outlook.com > spanayo...@comcast.net > >