Additionally there is "uuid" function available as well if that helps your use case.
Akshay Bhardwaj +91-97111-33849 On Thu, Jun 6, 2019 at 3:18 PM Akshay Bhardwaj < akshay.bhardwaj1...@gmail.com> wrote: > Hi Marcelo, > > If you are using spark 2.3+ and dataset API/SparkSQL,you can use this > inbuilt function "monotonically_increasing_id" in Spark. > A little tweaking using Spark sql inbuilt functions can enable you to > achieve this without having to write code or define RDDs with map/reduce > functions. > > Akshay Bhardwaj > +91-97111-33849 > > > On Thu, May 30, 2019 at 4:05 AM Marcelo Valle <marcelo.va...@ktech.com> > wrote: > >> Hi all, >> >> I am new to spark and I am trying to write an application using >> dataframes that normalize data. >> >> So I have a dataframe `denormalized_cities` with 3 columns: COUNTRY, >> CITY, CITY_NICKNAME >> >> Here is what I want to do: >> >> >> 1. Map by country, then for each country generate a new ID and write >> to a new dataframe `countries`, which would have COUNTRY_ID, COUNTRY - >> country ID would be generated, probably using >> `monotonically_increasing_id`. >> 2. For each country, write several lines on a new dataframe `cities`, >> which would have COUNTRY_ID, ID, CITY, CITY_NICKNAME. COUNTRY_ID would be >> the same generated on country table and ID would be another ID I generate. >> >> What's the best way to do this, hopefully using only dataframes (no low >> level RDDs) unless it's not possible? >> >> I clearly see a MAP/Reduce process where for each KEY mapped I generate a >> row in countries table with COUNTRY_ID and for every value I write a row in >> cities table. But how to implement this in an easy and efficient way? >> >> I thought about using a `GroupBy Country` and then using `collect` to >> collect all values for that country, but then I don't know how to generate >> the country id and I am not sure about memory efficiency of `collect` for a >> country with too many cities (bare in mind country/city is just an example, >> my real entities are different). >> >> Could anyone point me to the direction of a good solution? >> >> Thanks, >> Marcelo. >> >> This email is confidential [and may be protected by legal privilege]. If >> you are not the intended recipient, please do not copy or disclose its >> content but contact the sender immediately upon receipt. >> >> KTech Services Ltd is registered in England as company number 10704940. >> >> Registered Office: The River Building, 1 Cousin Lane, London EC4R 3TE, >> United Kingdom >> >