Hi,
  I have a very simple use case:

I have an rdd as following:

d = [[1,2,3,4],[1,5,2,3],[2,3,4,5]]

Now, I want to remove all the duplicates from a column and return the
remaining frame..
For example:
If i want to remove the duplicate based on column 1.
Then basically I would remove either row 1 or row 2 in my final result..
because the column 1 of both first and second row is the same element (1)
.. and hence the duplicate..
So, a possible result is:

output = [[1,2,3,4],[2,3,4,5]]

How do I do this in spark?
Thanks

Reply via email to