I am not sure if 32 partitions is a hard limit that you have.
Unless you have a strong reason to use only 32 partitions, please try
providing the second optional
argument (numPartitions) to reduceByKey and sortByKey methods which will
paralellize these Reduce operations.
A number 3x the number of
Thanks for your answer. But the problem is that I only want to sort the 32
partitions, individually,
not the complete input. So yes, the output has to consist of 32 partitions,
each sorted.
Ceriel Jacobs
On 12/04/2013 06:30 PM, Ashish Rangole wrote:
I am not sure if 32 partitions is a hard
Thanks for your answer.
The partitioning function is not that important. What is important that I only
sort the partitions,
not the complete RDD. Your suggestion to use
rdd.distinct.coalesce(32).mapPartitions(p = sorted(p))
sounds nice, and I had indeed seen the coalesce method and the