Re: Adding a column to a SchemaRDD

Yanbo Liang Fri, 12 Dec 2014 03:03:05 -0800

RDD is immutable so you can not modify it.
If you want to modify some value or schema in RDD,  using map to generate a
new RDD.
The following code for your reference:


def add(a:Int,b:Int):Int = {
  a + b
}

val d1 = sc.parallelize(1 to 10).map { i => (i, i+1, i+2) }
val d2 = d1.map { i => (i._1, i._2, add(i._1, i._2))}
d2.foreach(println)


Otherwise, if your self-defining function is straightforward and you can
represent it by SQL, using Spark SQL or DSL is also a good choice.

case class Person(id: Int, score: Int, value: Int)

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

import sqlContext._

val d1 = sc.parallelize(1 to 10).map { i => Person(i,i+1,i+2)}
val d2 = d1.select('id, 'score, 'id + 'score)
d2.foreach(println)


2014-12-12 14:11 GMT+08:00 Nathan Kronenfeld <nkronenf...@oculusinfo.com>:

> Hi, there.
>
> I'm trying to understand how to augment data in a SchemaRDD.
>
> I can see how to do it if can express the added values in SQL - just run
> "SELECT *,valueCalculation AS newColumnName FROM table"
>
> I've been searching all over for how to do this if my added value is a
> scala function, with no luck.
>
> Let's say I have a SchemaRDD with columns A, B, and C, and I want to add a
> new column, D, calculated using Utility.process(b, c), and I want (of
> course) to pass in the value B and C from each row, ending up with a new
> SchemaRDD with columns A, B, C, and D.
>
> Is this possible? If so, how?
>
> Thanks,
>                    -Nathan
>
> --
> Nathan Kronenfeld
> Senior Visualization Developer
> Oculus Info Inc
> 2 Berkeley Street, Suite 600,
> Toronto, Ontario M5A 4J5
> Phone:  +1-416-203-3003 x 238
> Email:  nkronenf...@oculusinfo.com
>

Re: Adding a column to a SchemaRDD

Reply via email to