Re: Spark Streaming: DStream - zipWithIndex

Patrick Wendell Wed, 27 Aug 2014 16:06:06 -0700

Yeah - each batch will produce a new RDD.

On Wed, Aug 27, 2014 at 3:33 PM, Soumitra Kumar
<kumar.soumi...@gmail.com> wrote:
> Thanks.
>
> Just to double check, rdd.id would be unique for a batch in a DStream?
>
>
> On Wed, Aug 27, 2014 at 3:04 PM, Xiangrui Meng <men...@gmail.com> wrote:
>>
>> You can use RDD id as the seed, which is unique in the same spark
>> context. Suppose none of the RDDs would contain more than 1 billion
>> records. Then you can use
>>
>> rdd.zipWithUniqueId().mapValues(uid => rdd.id * 1e9.toLong + uid)
>>
>> Just a hack ..
>>
>> On Wed, Aug 27, 2014 at 2:59 PM, Soumitra Kumar
>> <kumar.soumi...@gmail.com> wrote:
>> > So, I guess zipWithUniqueId will be similar.
>> >
>> > Is there a way to get unique index?
>> >
>> >
>> > On Wed, Aug 27, 2014 at 2:39 PM, Xiangrui Meng <men...@gmail.com> wrote:
>> >>
>> >> No. The indices start at 0 for every RDD. -Xiangrui
>> >>
>> >> On Wed, Aug 27, 2014 at 2:37 PM, Soumitra Kumar
>> >> <kumar.soumi...@gmail.com> wrote:
>> >> > Hello,
>> >> >
>> >> > If I do:
>> >> >
>> >> > DStream transform {
>> >> >     rdd.zipWithIndex.map {
>> >> >
>> >> >         Is the index guaranteed to be unique across all RDDs here?
>> >> >
>> >> > }
>> >> > }
>> >> >
>> >> > Thanks,
>> >> > -Soumitra.
>> >
>> >
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark Streaming: DStream - zipWithIndex

Reply via email to