Partition Case Class RDD without ParRDDFunctions

Night Wolf Wed, 06 May 2015 02:15:07 -0700

Hi,

If I have an RDD[MyClass] and I want to partition it by the hash code of
MyClass for performance reasons, is there any way to do this without
converting it into a PairRDD RDD[(K,V)] and calling partitionBy???


Mapping it to a tuple2 seems like a waste of space/computation.

It looks like the PairRDDFunctions..partitionBy() uses a ShuffleRDD[K,V,C]
requires K,V,C? Could I create a new
ShuffleRDD[MyClass,MyClass,MyClass](caseClassRdd, new HashParitioner)?

Cheers,
N

Partition Case Class RDD without ParRDDFunctions

Reply via email to