But then if you want to generate ids that are unique across ALL the records that you are going to see in a stream (which can be potentially infinite), then you definitely need a number space larger than long :)
TD On Thu, Aug 28, 2014 at 12:48 PM, Soumitra Kumar <kumar.soumi...@gmail.com> wrote: > Yes, that is an option. > > I started with a function of batch time, and index to generate id as long. > This may be faster than generating UUID, with added benefit of sorting > based on time. > > ----- Original Message ----- > From: "Tathagata Das" <tathagata.das1...@gmail.com> > To: "Soumitra Kumar" <kumar.soumi...@gmail.com> > Cc: "Xiangrui Meng" <men...@gmail.com>, user@spark.apache.org > Sent: Thursday, August 28, 2014 2:19:38 AM > Subject: Re: Spark Streaming: DStream - zipWithIndex > > > If just want arbitrary unique id attached to each record in a dstream (no > ordering etc), then why not create generate and attach an UUID to each > record? > > > > > > On Wed, Aug 27, 2014 at 4:18 PM, Soumitra Kumar < kumar.soumi...@gmail.com > > wrote: > > > > I see a issue here. > > > If rdd.id is 1000 then rdd.id * 1e9.toLong would be BIG. > > > I wish there was DStream mapPartitionsWithIndex. > > > > > > On Wed, Aug 27, 2014 at 3:04 PM, Xiangrui Meng < men...@gmail.com > wrote: > > > You can use RDD id as the seed, which is unique in the same spark > context. Suppose none of the RDDs would contain more than 1 billion > records. Then you can use > > rdd.zipWithUniqueId().mapValues(uid => rdd.id * 1e9.toLong + uid) > > Just a hack .. > > On Wed, Aug 27, 2014 at 2:59 PM, Soumitra Kumar > > > < kumar.soumi...@gmail.com > wrote: > > So, I guess zipWithUniqueId will be similar. > > > > Is there a way to get unique index? > > > > > > On Wed, Aug 27, 2014 at 2:39 PM, Xiangrui Meng < men...@gmail.com > > wrote: > >> > >> No. The indices start at 0 for every RDD. -Xiangrui > >> > >> On Wed, Aug 27, 2014 at 2:37 PM, Soumitra Kumar > >> < kumar.soumi...@gmail.com > wrote: > >> > Hello, > >> > > >> > If I do: > >> > > >> > DStream transform { > >> > rdd.zipWithIndex.map { > >> > > >> > Is the index guaranteed to be unique across all RDDs here? > >> > > >> > } > >> > } > >> > > >> > Thanks, > >> > -Soumitra. > > > > > > >