subject:"Dataset , RDD zipWithIndex \-\- How to use as a map ."

Re: Dataset , RDD zipWithIndex -- How to use as a map .

2016-07-22 Thread Pedro Rodriguez

You could either do monotonically_increasing_id or use a window function and rank. The first is a simple spark SQL function, data bricks has a pretty helpful post for how to use window functions (in this case the whole data set is the window). On Fri, Jul 22, 2016 at 12:20 PM, Marco Mistroni

Re: Dataset , RDD zipWithIndex -- How to use as a map .

2016-07-22 Thread Marco Mistroni

Hi So u u have a data frame, then use zipwindex and create a tuple I m not sure if df API has something useful for zip w index. But u can - get a data frame - convert it to rdd (there's a tordd ) - do a zip with index That will give u a rdd with 3 fields... I don't think you can update df

Re: Dataset , RDD zipWithIndex -- How to use as a map .

2016-07-22 Thread VG

Hi All, Any suggestions for this Regards, VG On Fri, Jul 22, 2016 at 6:40 PM, VG wrote: > Hi All, > > I am really confused how to proceed further. Please help. > > I have a dataset created as follows: > Dataset b = sqlContext.sql("SELECT bid, name FROM business"); > > Now I

Dataset , RDD zipWithIndex -- How to use as a map .

2016-07-22 Thread VG

Hi All, I am really confused how to proceed further. Please help. I have a dataset created as follows: Dataset b = sqlContext.sql("SELECT bid, name FROM business"); Now I need to map each name with a unique index and I did the following JavaPairRDD indexedBId = business.javaRDD()