Re: Spark SQL running totals

Kristina Rogale Plazonic Thu, 15 Oct 2015 11:58:07 -0700

You can do it and many other transformations very easily with window
functions, see this blog post:


https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html

In your case you would do (in Scala):

import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.{functions => func}

val wo = Window.orderBy("col1")
val newdf = df.withColumn("col3", func.sum("col2").over(wo))

The true power of this is when you have to do a running total by a
particular, say, user (in a different column)! :)

HTH,
Kristina

On Thu, Oct 15, 2015 at 1:48 PM, Stefan Panayotov <spanayo...@msn.com>
wrote:

> Hi,
>
> I need help with Spark SQL. I need to achieve something like the following.
> If I have data like:
>
> col_1  col_2
> 1         10
> 2         30
> 3         15
> 4         20
> 5         25
>
> I need to get col_3 to be the running total of the sum of the previous
> rows of col_2, e.g.
>
> col_1  col_2  col_3
> 1         10        10
> 2         30        40
> 3         15        55
> 4         20        75
> 5         25        100
>
> Is there a way to achieve this in Spark SQL or maybe with Data frame
> transformations?
>
> Thanks in advance,
>
>
> *Stefan Panayotov, PhD **Home*: 610-355-0919
> *Cell*: 610-517-5586
> *email*: spanayo...@msn.com
> spanayo...@outlook.com
> spanayo...@comcast.net
>
>

Re: Spark SQL running totals

Reply via email to