> Cc: "Xiangrui Meng" , user@spark.apache.org
> Sent: Thursday, August 28, 2014 2:19:38 AM
> Subject: Re: Spark Streaming: DStream - zipWithIndex
>
>
> If just want arbitrary unique id attached to each record in a dstream (no
> ordering etc), then why not create gener
uot;Xiangrui Meng" , user@spark.apache.org
Sent: Thursday, August 28, 2014 2:19:38 AM
Subject: Re: Spark Streaming: DStream - zipWithIndex
If just want arbitrary unique id attached to each record in a dstream (no
ordering etc), then why not create generate and attach an UUID to each record?
O
If just want arbitrary unique id attached to each record in a dstream (no
ordering etc), then why not create generate and attach an UUID to each
record?
On Wed, Aug 27, 2014 at 4:18 PM, Soumitra Kumar
wrote:
> I see a issue here.
>
> If rdd.id is 1000 then rdd.id * 1e9.toLong would be BIG.
>
>
I see a issue here.
If rdd.id is 1000 then rdd.id * 1e9.toLong would be BIG.
I wish there was DStream mapPartitionsWithIndex.
On Wed, Aug 27, 2014 at 3:04 PM, Xiangrui Meng wrote:
> You can use RDD id as the seed, which is unique in the same spark
> context. Suppose none of the RDDs would con
Yeah - each batch will produce a new RDD.
On Wed, Aug 27, 2014 at 3:33 PM, Soumitra Kumar
wrote:
> Thanks.
>
> Just to double check, rdd.id would be unique for a batch in a DStream?
>
>
> On Wed, Aug 27, 2014 at 3:04 PM, Xiangrui Meng wrote:
>>
>> You can use RDD id as the seed, which is unique
Thanks.
Just to double check, rdd.id would be unique for a batch in a DStream?
On Wed, Aug 27, 2014 at 3:04 PM, Xiangrui Meng wrote:
> You can use RDD id as the seed, which is unique in the same spark
> context. Suppose none of the RDDs would contain more than 1 billion
> records. Then you can
You can use RDD id as the seed, which is unique in the same spark
context. Suppose none of the RDDs would contain more than 1 billion
records. Then you can use
rdd.zipWithUniqueId().mapValues(uid => rdd.id * 1e9.toLong + uid)
Just a hack ..
On Wed, Aug 27, 2014 at 2:59 PM, Soumitra Kumar
wrote:
So, I guess zipWithUniqueId will be similar.
Is there a way to get unique index?
On Wed, Aug 27, 2014 at 2:39 PM, Xiangrui Meng wrote:
> No. The indices start at 0 for every RDD. -Xiangrui
>
> On Wed, Aug 27, 2014 at 2:37 PM, Soumitra Kumar
> wrote:
> > Hello,
> >
> > If I do:
> >
> > DStream
No. The indices start at 0 for every RDD. -Xiangrui
On Wed, Aug 27, 2014 at 2:37 PM, Soumitra Kumar
wrote:
> Hello,
>
> If I do:
>
> DStream transform {
> rdd.zipWithIndex.map {
>
> Is the index guaranteed to be unique across all RDDs here?
>
> }
> }
>
> Thanks,
> -Soumitra.
Hello,
If I do:
DStream transform {
rdd.zipWithIndex.map {
Is the index guaranteed to be unique across all RDDs here?
}
}
Thanks,
-Soumitra.
10 matches
Mail list logo