Hi Ayan, How will I get column wise distinct items using this approach ?
On Mon, Sep 19, 2016 at 3:31 PM, ayan guha <guha.a...@gmail.com> wrote: > Create an array out of cilumns, convert to Dataframe, > explode,distinct,write. > On 19 Sep 2016 19:11, "Saurav Sinha" <sauravsinh...@gmail.com> wrote: > >> You can use distinct over you data frame or rdd >> >> rdd.distinct >> >> It will give you distinct across your row. >> >> On Mon, Sep 19, 2016 at 2:35 PM, Abhishek Anand <abhis.anan...@gmail.com> >> wrote: >> >>> I have an rdd which contains 14 different columns. I need to find the >>> distinct across all the columns of rdd and write it to hdfs. >>> >>> How can I acheive this ? >>> >>> Is there any distributed data structure that I can use and keep on >>> updating it as I traverse the new rows ? >>> >>> Regards, >>> Abhi >>> >> >> >> >> -- >> Thanks and Regards, >> >> Saurav Sinha >> >> Contact: 9742879062 >> >