Hi guys,
If I load a dataframe via a sql context that has a SORT BY in the query and
I want to repartition the data frame will it keep the sort order in each
partition?
I want to repartition because I'm going to run a Map that generates lots of
data internally so to avoid Out Of Memory errors I need to create smaller
partitions.
The source of the table is a parquet file.
sqlContext.sql("select * from tblx sort by colA")
.repartition(defaultParallelism * 40)
.map { //..make all the rows }
Cheers,
~N