I see. Thanks Akhil.
On Thu, Jul 31, 2014 at 6:08 PM, Akhil Das <[email protected]> wrote: > Hi > > According to the documentation > http://spark.apache.org/docs/1.0.0/api/java/index.html it says *coalesce > <http://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/rdd/RDD.html#coalesce(int,%20boolean,%20scala.math.Ordering)>*(int > numPartitions, > boolean shuffle, *scala.math.Ordering<**T* > <http://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/rdd/RDD.html> > *> ord*) > Return a new RDD that is reduced into numPartitions partitions. > > You could try something like the following: > > val rdd: (WrapWithComparable[(Array[Byte], Array[Byte], Array[Byte])], > Externalizer[KeyValue]) = ... > val rdd_coalesced = rdd.coalesce(Math.min(1000, rdd.partitions.length), > false, null) > > > > > > Thanks > Best Regards > > > On Thu, Jul 31, 2014 at 7:15 AM, Jianshi Huang <[email protected]> > wrote: > >> In my code I have something like >> >> val rdd: (WrapWithComparable[(Array[Byte], Array[Byte], Array[Byte])], >> Externalizer[KeyValue]) = ... >> val rdd_coalesced = rdd.coalesce(Math.min(1000, rdd.partitions.length)) >> >> My purpose is to limit the number of partitions (later sortByKey always >> reported "too many open files" error). >> >> However, it won't compile, scala compiler complains "erroneous and >> inaccessible type". >> >> What's the problem? BTW, I found coalesce requires an implicit Ordering, >> why does it need that? >> >> I'm currently using repartition, it compiles fine, the doc says it always >> shuffles and recommends using coalesce for reducing partitions. >> >> Anyone can help me here? >> >> -- >> Jianshi Huang >> >> LinkedIn: jianshi >> Twitter: @jshuang >> Github & Blog: http://huangjs.github.com/ >> > > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
