Since the data.length is variable, I am not sure whether mixing data.length and the index makes sense.
Can you describe your use case in bit more detail ? Thanks On Tue, Jun 28, 2016 at 11:34 AM, Punit Naik <naik.puni...@gmail.com> wrote: > Hi Ted > > So would the tuple look like: (x._1, split.startIndex + x._2 + > x._1.length) ? > > On Tue, Jun 28, 2016 at 11:09 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Please take a look at: >> core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala >> >> In compute() method: >> val split = splitIn.asInstanceOf[ZippedWithIndexRDDPartition] >> firstParent[T].iterator(split.prev, context).zipWithIndex.map { x => >> (x._1, split.startIndex + x._2) >> >> You can modify the second component of the tuple to take data.length >> into account. >> >> On Tue, Jun 28, 2016 at 10:31 AM, Punit Naik <naik.puni...@gmail.com> >> wrote: >> >>> Hi >>> >>> I wanted to change the functioning of the "zipWithIndex" function for >>> spark RDDs in which the output of the function is, just for an example, >>> "(data, prev_index+data.length)" instead of "(data,prev_index+1)". >>> >>> How can I do this? >>> >>> -- >>> Thank You >>> >>> Regards >>> >>> Punit Naik >>> >> >> > > > -- > Thank You > > Regards > > Punit Naik >