Hi, I have a problem that requires me to go through the rows in a DataFrame (or possibly through rows in a JSON file) and conditionally add rows depending on a value in one of the columns in each existing row. So, for example if I have: +---+---+---+
| _1| _2| _3| +---+---+---+ |ID1|100|1.1| |ID2|200|2.2| |ID3|300|3.3| |ID4|400|4.4| +---+---+---+ I need to be able to get: +---+---+---+--------------------+---+ | _1| _2| _3| _4| _5| +---+---+---+--------------------+---+ |ID1|100|1.1|ID1 add text or d...| 25| |id11 ..|21 | |id12 ..|22 | |ID2|200|2.2|ID2 add text or d...| 50| |id21 ..|33 | |id22 ..|34 | |id23 ..|35 | |ID3|300|3.3|ID3 add text or d...| 75| |id31 ..|11 | |ID4|400|4.4|ID4 add text or d...|100| |id41 ..|51 | |id42 ..|52 | |id43 ..|53 | |id44 ..|54 | +---+---+---+--------------------+---+ How can I achieve this in Spark without doing DF.collect(), which will get everything to the driver and for a big data set I'll get OOM? BTW, I know how to use withColumn() to add new columns to the DataFrame. I need to also add new rows. Any help will be appreciated. Thanks, Stefan Panayotov, PhD Home: 610-355-0919 Cell: 610-517-5586 email: spanayo...@msn.com spanayo...@outlook.com spanayo...@comcast.net