Spark Sorted DataFrame & Repartitioning

Night Wolf Wed, 13 May 2015 07:10:27 -0700

Hi guys,

If I load a dataframe via a sql context that has a SORT BY in the query and
I want to repartition the data frame will it keep the sort order in each
partition?


I want to repartition because I'm going to run a Map that generates lots of
data internally so to avoid Out Of Memory errors I need to create smaller
partitions.

The source of the table is a parquet file.

sqlContext.sql("select * from tblx sort by colA")
  .repartition(defaultParallelism * 40)
  .map { //..make all the rows }

Cheers,
~N

Spark Sorted DataFrame & Repartitioning

Reply via email to