Re: Partition Case Class RDD without ParRDDFunctions

2015-05-07 Thread Jonathan Coveney
what about .groupBy doesn't work for you? 2015-05-07 8:17 GMT-04:00 Night Wolf : > MyClass is a basic scala case class (using Spark 1.3.1); > > case class Result(crn: Long, pid: Int, promoWk: Int, windowKey: Int, ipi: > Double) { > override def hashCode(): Int = crn.hashCode() > } > > > On Wed

Re: Partition Case Class RDD without ParRDDFunctions

2015-05-07 Thread Night Wolf
MyClass is a basic scala case class (using Spark 1.3.1); case class Result(crn: Long, pid: Int, promoWk: Int, windowKey: Int, ipi: Double) { override def hashCode(): Int = crn.hashCode() } On Wed, May 6, 2015 at 8:09 PM, ayan guha wrote: > How does your MyClqss looks like? I was experimentin

Re: Partition Case Class RDD without ParRDDFunctions

2015-05-06 Thread ayan guha
How does your MyClqss looks like? I was experimenting with Row class in python and apparently partitionby automatically takes first column as key. However, I am not sure how you can access a part of an object without deserializing it (either explicitly or Spark doing it for you) On Wed, May 6,

Partition Case Class RDD without ParRDDFunctions

2015-05-06 Thread Night Wolf
Hi, If I have an RDD[MyClass] and I want to partition it by the hash code of MyClass for performance reasons, is there any way to do this without converting it into a PairRDD RDD[(K,V)] and calling partitionBy??? Mapping it to a tuple2 seems like a waste of space/computation. It looks like the P