Re: Splitting into partitions and sorting the partitions ... how to do that?

2013-12-04 Thread Ashish Rangole
I am not sure if 32 partitions is a hard limit that you have. Unless you have a strong reason to use only 32 partitions, please try providing the second optional argument (numPartitions) to reduceByKey and sortByKey methods which will paralellize these Reduce operations. A number 3x the number of

Re: Splitting into partitions and sorting the partitions ... how to do that?

2013-12-04 Thread Ceriel Jacobs
Thanks for your answer. But the problem is that I only want to sort the 32 partitions, individually, not the complete input. So yes, the output has to consist of 32 partitions, each sorted. Ceriel Jacobs On 12/04/2013 06:30 PM, Ashish Rangole wrote: I am not sure if 32 partitions is a hard

Re: Splitting into partitions and sorting the partitions ... how to do that?

2013-12-04 Thread Ceriel Jacobs
Thanks for your answer. The partitioning function is not that important. What is important that I only sort the partitions, not the complete RDD. Your suggestion to use rdd.distinct.coalesce(32).mapPartitions(p = sorted(p)) sounds nice, and I had indeed seen the coalesce method and the