Use combineByKey. For top 10 as an example (bottom 10 work similarly): add the element to a list. If the list is larger than 10, delete the smallest elements until size is back to 10. -Sven
On Tue, Jan 27, 2015 at 3:35 AM, kundan kumar <iitr.kun...@gmail.com> wrote: > I have a an array of the form > > val array: Array[(Int, (String, Int))] = Array( > (idx1,(word1,count1)), > (idx2,(word2,count2)), > (idx1,(word1,count1)), > (idx3,(word3,count1)), > (idx4,(word4,count4))).... > > I want to get the top 10 and bottom 10 elements from this array for each > index (idx1,idx2,....). Basically I want the top 10 most occuring and > bottom 10 least occuring elements for each index value. > > Please suggest how to acheive in spark in most efficient way. I have tried > it using the for loops for each index but this makes the program too slow > and runs sequentially. > > Thanks, > > Kundan > > > -- http://sites.google.com/site/krasser/?utm_source=sig