I have a data frame which I sort using orderBy function. This operation causes my data frame to go to a single partition. After using those results, I would like to re-partition to a larger number of partitions. Currently I am just doing:
val rdd = df.rdd.coalesce(100, true) //df is a dataframe with a single partition and around 14 million records val newDF = hc.createDataFrame(rdd, df.schema) This process is really slow. Is there any other way of achieving this task, or to optimize it (perhaps tweaking a spark configuration parameter)? Thanks a lot -- Cesar Flores