Here is another way you can achieve that(in Python): base_df.withColumn("column_name","column_expression_for_new_column") # To add new row create the data frame containing the new row and do the unionAll() base_df.unionAll(new_df)
# Another approach convert to rdd add required fields and convert back to Dataframe def update_row(row): """Add extra column according to your logic""" # Example update_row = row + ("Text","number",) return row updated_row_rdd = base_df.map(lambda row: update_row(row)) # Convert back to rdd with giving the schema updated_df = sql_context.createDataFrame(updated_row_rdd, schema) # To add extra row create the new data frame with the new row and do the unionAll result_df = updated_df.unionAll(new_row_df) Thanks, Bijay On Fri, Mar 11, 2016 at 11:49 AM, Michael Armbrust <mich...@databricks.com> wrote: > Or look at explode on DataFrame > > On Fri, Mar 11, 2016 at 10:45 AM, Stefan Panayotov <spanayo...@msn.com> > wrote: > >> Hi, >> >> I have a problem that requires me to go through the rows in a DataFrame >> (or possibly through rows in a JSON file) and conditionally add rows >> depending on a value in one of the columns in each existing row. So, for >> example if I have: >> >> >> +---+---+---+ >> | _1| _2| _3| >> +---+---+---+ >> |ID1|100|1.1| >> |ID2|200|2.2| >> |ID3|300|3.3| >> |ID4|400|4.4| >> +---+---+---+ >> >> I need to be able to get: >> >> >> +---+---+---+--------------------+---+ >> | _1| _2| _3| _4| _5| >> +---+---+---+--------------------+---+ >> |ID1|100|1.1|ID1 add text or d...| 25| >> |id11 ..|21 | >> |id12 ..|22 | >> |ID2|200|2.2|ID2 add text or d...| 50| >> |id21 ..|33 | >> |id22 ..|34 | >> |id23 ..|35 | >> |ID3|300|3.3|ID3 add text or d...| 75| >> |id31 ..|11 | >> |ID4|400|4.4|ID4 add text or d...|100| >> |id41 ..|51 | >> |id42 ..|52 | >> |id43 ..|53 | >> |id44 ..|54 | >> +---+---+---+--------------------+---+ >> >> How can I achieve this in Spark without doing DF.collect(), which will >> get everything to the driver and for a big data set I'll get OOM? >> BTW, I know how to use withColumn() to add new columns to the DataFrame. >> I need to also add new rows. >> Any help will be appreciated. >> >> Thanks, >> >> >> *Stefan Panayotov, PhD **Home*: 610-355-0919 >> *Cell*: 610-517-5586 >> *email*: spanayo...@msn.com >> spanayo...@outlook.com >> spanayo...@comcast.net >> >> > >