I have a data frame which I sort using orderBy function. This operation
causes my data frame to go to a single partition. After using those
results, I would like to re-partition to a larger number of partitions.
Currently I am just doing:

val rdd = df.rdd.coalesce(100, true) //df is a dataframe with a single
partition and around 14 million records
val newDF =  hc.createDataFrame(rdd, df.schema)

This process is really slow. Is there any other way of achieving this task,
or to optimize it (perhaps tweaking a spark configuration parameter)?


Thanks a lot
-- 
Cesar Flores

Reply via email to