You can do it and many other transformations very easily with window functions, see this blog post:
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html In your case you would do (in Scala): import org.apache.spark.sql.expressions.Window import org.apache.spark.sql.{functions => func} val wo = Window.orderBy("col1") val newdf = df.withColumn("col3", func.sum("col2").over(wo)) The true power of this is when you have to do a running total by a particular, say, user (in a different column)! :) HTH, Kristina On Thu, Oct 15, 2015 at 1:48 PM, Stefan Panayotov <spanayo...@msn.com> wrote: > Hi, > > I need help with Spark SQL. I need to achieve something like the following. > If I have data like: > > col_1 col_2 > 1 10 > 2 30 > 3 15 > 4 20 > 5 25 > > I need to get col_3 to be the running total of the sum of the previous > rows of col_2, e.g. > > col_1 col_2 col_3 > 1 10 10 > 2 30 40 > 3 15 55 > 4 20 75 > 5 25 100 > > Is there a way to achieve this in Spark SQL or maybe with Data frame > transformations? > > Thanks in advance, > > > *Stefan Panayotov, PhD **Home*: 610-355-0919 > *Cell*: 610-517-5586 > *email*: spanayo...@msn.com > spanayo...@outlook.com > spanayo...@comcast.net > >