Optimal way to re-partition from a single partition

Cesar Flores Mon, 08 Feb 2016 14:31:17 -0800

I have a data frame which I sort using orderBy function. This operation
causes my data frame to go to a single partition. After using those
results, I would like to re-partition to a larger number of partitions.
Currently I am just doing:


val rdd = df.rdd.coalesce(100, true) //df is a dataframe with a single
partition and around 14 million records
val newDF =  hc.createDataFrame(rdd, df.schema)

This process is really slow. Is there any other way of achieving this task,
or to optimize it (perhaps tweaking a spark configuration parameter)?


Thanks a lot
-- 
Cesar Flores

Optimal way to re-partition from a single partition

Reply via email to