what about .groupBy doesn't work for you?
2015-05-07 8:17 GMT-04:00 Night Wolf :
> MyClass is a basic scala case class (using Spark 1.3.1);
>
> case class Result(crn: Long, pid: Int, promoWk: Int, windowKey: Int, ipi:
> Double) {
> override def hashCode(): Int = crn.hashCode()
> }
>
>
> On Wed
MyClass is a basic scala case class (using Spark 1.3.1);
case class Result(crn: Long, pid: Int, promoWk: Int, windowKey: Int,
ipi: Double) {
override def hashCode(): Int = crn.hashCode()
}
On Wed, May 6, 2015 at 8:09 PM, ayan guha wrote:
> How does your MyClqss looks like? I was experimentin
How does your MyClqss looks like? I was experimenting with Row class in
python and apparently partitionby automatically takes first column as key.
However, I am not sure how you can access a part of an object without
deserializing it (either explicitly or Spark doing it for you)
On Wed, May 6,
Hi,
If I have an RDD[MyClass] and I want to partition it by the hash code of
MyClass for performance reasons, is there any way to do this without
converting it into a PairRDD RDD[(K,V)] and calling partitionBy???
Mapping it to a tuple2 seems like a waste of space/computation.
It looks like the P