Re: Modify the functioning of zipWithIndex function for RDDs

Ted Yu Tue, 28 Jun 2016 18:02:45 -0700

Since the data.length is variable, I am not sure whether mixing data.length
and the index makes sense.


Can you describe your use case in bit more detail ?

Thanks

On Tue, Jun 28, 2016 at 11:34 AM, Punit Naik <naik.puni...@gmail.com> wrote:

> Hi Ted
>
> So would the tuple look like: (x._1, split.startIndex + x._2 +
> x._1.length) ?
>
> On Tue, Jun 28, 2016 at 11:09 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Please take a look at:
>> core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala
>>
>> In compute() method:
>>     val split = splitIn.asInstanceOf[ZippedWithIndexRDDPartition]
>>     firstParent[T].iterator(split.prev, context).zipWithIndex.map { x =>
>>       (x._1, split.startIndex + x._2)
>>
>> You can modify the second component of the tuple to take data.length
>> into account.
>>
>> On Tue, Jun 28, 2016 at 10:31 AM, Punit Naik <naik.puni...@gmail.com>
>> wrote:
>>
>>> Hi
>>>
>>> I wanted to change the functioning of the "zipWithIndex" function for
>>> spark RDDs in which the output of the function is, just for an example,
>>>  "(data, prev_index+data.length)" instead of "(data,prev_index+1)".
>>>
>>> How can I do this?
>>>
>>> --
>>> Thank You
>>>
>>> Regards
>>>
>>> Punit Naik
>>>
>>
>>
>
>
> --
> Thank You
>
> Regards
>
> Punit Naik
>

Re: Modify the functioning of zipWithIndex function for RDDs

Reply via email to