You could either do monotonically_increasing_id or use a window function
and rank. The first is a simple spark SQL function, data bricks has a
pretty helpful post for how to use window functions (in this case the whole
data set is the window).
On Fri, Jul 22, 2016 at 12:20 PM, Marco Mistroni
Hi
So u u have a data frame, then use zipwindex and create a tuple
I m not sure if df API has something useful for zip w index.
But u can
- get a data frame
- convert it to rdd (there's a tordd )
- do a zip with index
That will give u a rdd with 3 fields...
I don't think you can update df
Hi All,
Any suggestions for this
Regards,
VG
On Fri, Jul 22, 2016 at 6:40 PM, VG wrote:
> Hi All,
>
> I am really confused how to proceed further. Please help.
>
> I have a dataset created as follows:
> Dataset b = sqlContext.sql("SELECT bid, name FROM business");
>
> Now I
Hi All,
I am really confused how to proceed further. Please help.
I have a dataset created as follows:
Dataset b = sqlContext.sql("SELECT bid, name FROM business");
Now I need to map each name with a unique index and I did the following
JavaPairRDD indexedBId = business.javaRDD()