Hi,
 
I have a problem that requires me to go through the rows in a DataFrame (or 
possibly through rows in a JSON file) and conditionally add rows depending on a 
value in one of the columns in each existing row. So, for example if I have:
 
+---+---+---+

| _1| _2| _3|


+---+---+---+


|ID1|100|1.1|


|ID2|200|2.2|


|ID3|300|3.3|


|ID4|400|4.4|


+---+---+---+



I need to be able to get:
 
+---+---+---+--------------------+---+

| _1| _2|
_3|                 
_4| _5|


+---+---+---+--------------------+---+


|ID1|100|1.1|ID1 add text or d...| 25|


|id11 ..|21 |


|id12 ..|22 |


|ID2|200|2.2|ID2 add text or d...| 50|


|id21 ..|33 |


|id22 ..|34 |


|id23 ..|35 |


|ID3|300|3.3|ID3 add text or d...| 75|


|id31 ..|11 |


|ID4|400|4.4|ID4 add text or d...|100|


|id41 ..|51 |


|id42 ..|52 |


|id43 ..|53 |


|id44 ..|54 |


+---+---+---+--------------------+---+



How can I achieve this in Spark without doing DF.collect(), which will get 
everything to the driver and for a big data set I'll get OOM?
BTW, I know how to use withColumn() to add new columns to the DataFrame. I need 
to also add new rows.
Any help will be appreciated.

Thanks,


Stefan Panayotov, PhD 
Home: 610-355-0919 
Cell: 610-517-5586 
email: spanayo...@msn.com 
spanayo...@outlook.com 
spanayo...@comcast.net
                                          

Reply via email to