Thanks Sean. Is  zipWtihIndex available in the Java API? Also, how do I
remove the generated id from further processing?

Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Fri, Feb 14, 2014 at 9:14 PM, Sean Owen <so...@cloudera.com> wrote:

> You could do a zipWithIndex to add a sort of "row ID" to each element
> of the input RDD. Then after self-joining, exclude elements whose row
> ID is the same.
> --
> Sean Owen | Director, Data Science | London
>
>
> On Fri, Feb 14, 2014 at 3:42 PM, Sonal Goyal <sonalgoy...@gmail.com>
> wrote:
> > Hi,
> >
> > I have some PairRDDs like
> >
> > K1 A
> > K1 B
> > K1 C
> >
> > K2 D
> > K2 D
> > K2 E
> >
> > and I want to create
> >
> > A B
> > A C
> > B C
> > D D
> > D E
> >
> > Whats the best way to do this? If I join the RDD with itself, I will end
> up
> > with A A which I do not want. I cant do distinct as that will filter out
> the
> > D D which I want.
> >
> > Any pointers? Thanks.
> >
> > Best Regards,
> > Sonal
> > Nube Technologies
> >
> >
> >
> >
>

Reply via email to