Thanks Sean. Is zipWtihIndex available in the Java API? Also, how do I remove the generated id from further processing?
Best Regards, Sonal Nube Technologies <http://www.nubetech.co> <http://in.linkedin.com/in/sonalgoyal> On Fri, Feb 14, 2014 at 9:14 PM, Sean Owen <so...@cloudera.com> wrote: > You could do a zipWithIndex to add a sort of "row ID" to each element > of the input RDD. Then after self-joining, exclude elements whose row > ID is the same. > -- > Sean Owen | Director, Data Science | London > > > On Fri, Feb 14, 2014 at 3:42 PM, Sonal Goyal <sonalgoy...@gmail.com> > wrote: > > Hi, > > > > I have some PairRDDs like > > > > K1 A > > K1 B > > K1 C > > > > K2 D > > K2 D > > K2 E > > > > and I want to create > > > > A B > > A C > > B C > > D D > > D E > > > > Whats the best way to do this? If I join the RDD with itself, I will end > up > > with A A which I do not want. I cant do distinct as that will filter out > the > > D D which I want. > > > > Any pointers? Thanks. > > > > Best Regards, > > Sonal > > Nube Technologies > > > > > > > > >