Hi, I just checked and i can see that there is method called withColumn: def withColumn(colName: String, col: Column <http://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Column.html> ): DataFrame <http://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrame.html>
Returns a new DataFrame <http://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrame.html> by adding a column. I can't test it now... But i think it should work. As i see it whole idea for data frames is to make them like data frames in R. And in R you can do that easily. It was late last night and i was tired but my idea was that you can iterate over first set add some index to every log using acumulators and then iterate over other set and add index from other acumulator then create tuple with keys from indexes and join. It is ugly and not efficient, and you should avoid it. :] Best Bojan On Thu, Apr 9, 2015 at 1:35 AM, barmaley [via Apache Spark User List] < ml-node+s1001560n22430...@n3.nabble.com> wrote: > Hi Bojan, > > Could you please expand your idea on how to append to RDD? I can think of > how to append a constant value to each row on RDD: > > //oldRDD - RDD[Array[String]] > val c = "const" > val newRDD = oldRDD.map(r=>c+:r) > > But how to append a custom column to RDD? Something like: > > val colToAppend = sc.makeRDD(1 to oldRDD.count().toInt) > //or sc.parallelize(1 to oldRDD.count().toInt) > //or (1 to 1 to oldRDD.count().toInt).toArray > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385p22430.html > To start a new topic under Apache Spark User List, email > ml-node+s1001560n1...@n3.nabble.com > To unsubscribe from Apache Spark User List, click here > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=Ymxvb2Q5cmF2ZW5AZ21haWwuY29tfDF8NTk3ODE0NzQ2> > . > NAML > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Append-column-to-Data-Frame-or-RDD-tp22385p22432.html Sent from the Apache Spark User List mailing list archive at Nabble.com.