Or look at explode on DataFrame On Fri, Mar 11, 2016 at 10:45 AM, Stefan Panayotov <spanayo...@msn.com> wrote:
> Hi, > > I have a problem that requires me to go through the rows in a DataFrame > (or possibly through rows in a JSON file) and conditionally add rows > depending on a value in one of the columns in each existing row. So, for > example if I have: > > > +---+---+---+ > | _1| _2| _3| > +---+---+---+ > |ID1|100|1.1| > |ID2|200|2.2| > |ID3|300|3.3| > |ID4|400|4.4| > +---+---+---+ > > I need to be able to get: > > > +---+---+---+--------------------+---+ > | _1| _2| _3| _4| _5| > +---+---+---+--------------------+---+ > |ID1|100|1.1|ID1 add text or d...| 25| > |id11 ..|21 | > |id12 ..|22 | > |ID2|200|2.2|ID2 add text or d...| 50| > |id21 ..|33 | > |id22 ..|34 | > |id23 ..|35 | > |ID3|300|3.3|ID3 add text or d...| 75| > |id31 ..|11 | > |ID4|400|4.4|ID4 add text or d...|100| > |id41 ..|51 | > |id42 ..|52 | > |id43 ..|53 | > |id44 ..|54 | > +---+---+---+--------------------+---+ > > How can I achieve this in Spark without doing DF.collect(), which will get > everything to the driver and for a big data set I'll get OOM? > BTW, I know how to use withColumn() to add new columns to the DataFrame. I > need to also add new rows. > Any help will be appreciated. > > Thanks, > > > *Stefan Panayotov, PhD **Home*: 610-355-0919 > *Cell*: 610-517-5586 > *email*: spanayo...@msn.com > spanayo...@outlook.com > spanayo...@comcast.net > >