Now I found the root cause is a Wrapper class in AnyRef is not Serializable, but even though I changed it to implements Serializable. the 'rows' still cannot get data... Any suggestion?
On Fri, Jun 23, 2017 at 10:56 AM, Lionel Luffy <lionelu...@gmail.com> wrote: > Hi there, > I'm trying to do below action while it always return > java.io.NotSerializableException > in the shuffle task. > I've checked that Array is serializable. how can I get the data of rdd in > newRDD? > > step 1: val rdd: RDD[(AnyRef, Array[AnyRef]] {......} > > step2 : rdd > .partitionBy(partitioner) > .map(_._2) > > step3: pass rdd to newRDD as prev: > newRDD[K, V] ( > xxx, > xxx, > xxx, > prev: RDD[Array[AnyRef]] extends RDD[(K, V)] (prev) { > > override protected def getPartitions() {...} > > override def compute(split: Partition, context: TaskContext): Iterator[(K, > V)] {... > val rows = firstParent[Array[AnyRef]].iterator(split, context) > > } > > } > > > Thanks, > LL >