Re: Spark SQL running totals

Deenar Toraskar Thu, 15 Oct 2015 11:36:09 -0700

you can do a self join of the table with itself with the join clause being
a.col1 >= b.col1


select a.col1, a.col2, sum(b.col2)
from tablea as a left outer join tablea as b on (a.col1 >= b.col1)
group by a.col1, a.col2

I havent tried it, but cant see why it cant work, but doing it in RDD might
be more efficient see
https://bzhangusc.wordpress.com/2014/06/21/calculate-running-sums/

On 15 October 2015 at 18:48, Stefan Panayotov <spanayo...@msn.com> wrote:

> Hi,
>
> I need help with Spark SQL. I need to achieve something like the following.
> If I have data like:
>
> col_1  col_2
> 1         10
> 2         30
> 3         15
> 4         20
> 5         25
>
> I need to get col_3 to be the running total of the sum of the previous
> rows of col_2, e.g.
>
> col_1  col_2  col_3
> 1         10        10
> 2         30        40
> 3         15        55
> 4         20        75
> 5         25        100
>
> Is there a way to achieve this in Spark SQL or maybe with Data frame
> transformations?
>
> Thanks in advance,
>
>
> *Stefan Panayotov, PhD **Home*: 610-355-0919
> *Cell*: 610-517-5586
> *email*: spanayo...@msn.com
> spanayo...@outlook.com
> spanayo...@comcast.net
>
>

Re: Spark SQL running totals

Reply via email to