Re: RDD.coalesce got compilation error

Jianshi Huang Thu, 31 Jul 2014 12:41:52 -0700

I see. Thanks Akhil.


On Thu, Jul 31, 2014 at 6:08 PM, Akhil Das <[email protected]>
wrote:

> Hi
>
> According to the documentation
> http://spark.apache.org/docs/1.0.0/api/java/index.html it says *coalesce
> <http://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/rdd/RDD.html#coalesce(int,%20boolean,%20scala.math.Ordering)>*(int
>  numPartitions,
> boolean shuffle, *scala.math.Ordering<**T*
> <http://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/rdd/RDD.html>
> *> ord*)
> Return a new RDD that is reduced into numPartitions partitions.
>
> You could try something like the following:
> 
> val rdd: (WrapWithComparable[(Array[Byte], Array[Byte], Array[Byte])],
> Externalizer[KeyValue]) = ...
> val rdd_coalesced = rdd.coalesce(Math.min(1000, rdd.partitions.length),
> false, null)
>
>
>
>
>
> Thanks
> Best Regards
>
>
> On Thu, Jul 31, 2014 at 7:15 AM, Jianshi Huang <[email protected]>
> wrote:
>
>> In my code I have something like
>>
>> val rdd: (WrapWithComparable[(Array[Byte], Array[Byte], Array[Byte])],
>> Externalizer[KeyValue]) = ...
>> val rdd_coalesced = rdd.coalesce(Math.min(1000, rdd.partitions.length))
>>
>> My purpose is to limit the number of partitions (later sortByKey always
>> reported "too many open files" error).
>>
>> However, it won't compile, scala compiler complains "erroneous and
>> inaccessible type".
>>
>> What's the problem? BTW, I found coalesce requires an implicit Ordering,
>> why does it need that?
>>
>> I'm currently using repartition, it compiles fine, the doc says it always
>> shuffles and recommends using coalesce for reducing partitions.
>>
>> Anyone can help me here?
>>
>> --
>> Jianshi Huang
>>
>> LinkedIn: jianshi
>> Twitter: @jshuang
>> Github & Blog: http://huangjs.github.com/
>>
>
>


-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Re: RDD.coalesce got compilation error

Reply via email to