Hi,

If I have an RDD[MyClass] and I want to partition it by the hash code of
MyClass for performance reasons, is there any way to do this without
converting it into a PairRDD RDD[(K,V)] and calling partitionBy???

Mapping it to a tuple2 seems like a waste of space/computation.

It looks like the PairRDDFunctions..partitionBy() uses a ShuffleRDD[K,V,C]
requires K,V,C? Could I create a new
ShuffleRDD[MyClass,MyClass,MyClass](caseClassRdd, new HashParitioner)?

Cheers,
N

Reply via email to