Re: Spark SQL running totals

Michael Armbrust Thu, 15 Oct 2015 11:52:50 -0700

Check out:
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html


On Thu, Oct 15, 2015 at 11:35 AM, Deenar Toraskar <deenar.toras...@gmail.com
> wrote:

> you can do a self join of the table with itself with the join clause being
> a.col1 >= b.col1
>
> select a.col1, a.col2, sum(b.col2)
> from tablea as a left outer join tablea as b on (a.col1 >= b.col1)
> group by a.col1, a.col2
>
> I havent tried it, but cant see why it cant work, but doing it in RDD
> might be more efficient see
> https://bzhangusc.wordpress.com/2014/06/21/calculate-running-sums/
>
> On 15 October 2015 at 18:48, Stefan Panayotov <spanayo...@msn.com> wrote:
>
>> Hi,
>>
>> I need help with Spark SQL. I need to achieve something like the
>> following.
>> If I have data like:
>>
>> col_1  col_2
>> 1         10
>> 2         30
>> 3         15
>> 4         20
>> 5         25
>>
>> I need to get col_3 to be the running total of the sum of the previous
>> rows of col_2, e.g.
>>
>> col_1  col_2  col_3
>> 1         10        10
>> 2         30        40
>> 3         15        55
>> 4         20        75
>> 5         25        100
>>
>> Is there a way to achieve this in Spark SQL or maybe with Data frame
>> transformations?
>>
>> Thanks in advance,
>>
>>
>> *Stefan Panayotov, PhD **Home*: 610-355-0919
>> *Cell*: 610-517-5586
>> *email*: spanayo...@msn.com
>> spanayo...@outlook.com
>> spanayo...@comcast.net
>>
>>
>
>

Re: Spark SQL running totals

Reply via email to