It very much depends on the logic that generates the new rows. Is it per row (i.e. without context?) then you can just convert to RDD and perform a map operation on each row.
JavaPairRDD<Object, Iterable<Row>> grouped = dataFrame.javaRDD().groupBy( group by what you need, probably ID ); return grouped.mapValues(rowsIt -> { List<Row> rows = Lists.newArrayList(rowsIt); return new list of rows based on you logic. }); convert back to DataFrame using flatMap and createDataFrame -- Jan Sterba https://twitter.com/honzasterba | http://flickr.com/honzasterba | http://500px.com/honzasterba On Fri, Mar 11, 2016 at 8:49 PM, Michael Armbrust <mich...@databricks.com> wrote: > Or look at explode on DataFrame > > On Fri, Mar 11, 2016 at 10:45 AM, Stefan Panayotov <spanayo...@msn.com> > wrote: >> >> Hi, >> >> I have a problem that requires me to go through the rows in a DataFrame >> (or possibly through rows in a JSON file) and conditionally add rows >> depending on a value in one of the columns in each existing row. So, for >> example if I have: >> >> >> +---+---+---+ >> >> | _1| _2| _3| >> +---+---+---+ >> |ID1|100|1.1| >> |ID2|200|2.2| >> |ID3|300|3.3| >> |ID4|400|4.4| >> +---+---+---+ >> >> I need to be able to get: >> >> >> +---+---+---+--------------------+---+ >> >> | _1| _2| _3| _4| _5| >> +---+---+---+--------------------+---+ >> |ID1|100|1.1|ID1 add text or d...| 25| >> |id11 ..|21 | >> |id12 ..|22 | >> |ID2|200|2.2|ID2 add text or d...| 50| >> |id21 ..|33 | >> |id22 ..|34 | >> |id23 ..|35 | >> |ID3|300|3.3|ID3 add text or d...| 75| >> |id31 ..|11 | >> |ID4|400|4.4|ID4 add text or d...|100| >> |id41 ..|51 | >> |id42 ..|52 | >> |id43 ..|53 | >> |id44 ..|54 | >> +---+---+---+--------------------+---+ >> >> How can I achieve this in Spark without doing DF.collect(), which will get >> everything to the driver and for a big data set I'll get OOM? >> BTW, I know how to use withColumn() to add new columns to the DataFrame. I >> need to also add new rows. >> Any help will be appreciated. >> >> Thanks, >> >> Stefan Panayotov, PhD >> Home: 610-355-0919 >> Cell: 610-517-5586 >> email: spanayo...@msn.com >> spanayo...@outlook.com >> spanayo...@comcast.net >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org