subject:"how to find top N values using map\-reduce \?"

Re: how to find top N values using map-reduce ?

2013-02-02 Thread Niels Basjes

My suggestion is to use secondary sort with a single reducer. That easy you can easily extract the top N. If you want to get the top N% you'll need an additional phase to determine how many records this N% really is. -- Met vriendelijke groet, Niels Basjes (Verstuurd vanaf mobiel ) Op 2 feb. 2013

Re: how to find top N values using map-reduce ?

2013-02-02 Thread praveenesh kumar

My actual problem is to rank all values and then run logic 1 to top n% values and logic 2 to rest values. 1st - Ranking ? (need major suggestions here) 2nd - Find top n% out of them. Then rest is covered. Regards Praveenesh On Sat, Feb 2, 2013 at 1:42 PM, Lake Chang wrote: > there's one thing

Re: how to find top N values using map-reduce ?

2013-02-02 Thread Russell Jurney

Maybe look at the pig source to see how it does it? Russell Jurney http://datasyndrome.com On Feb 1, 2013, at 11:37 PM, praveenesh kumar wrote: > Thanks for that Russell. Unfortunately I can't use Pig. Need to write > my own MR job. I was wondering how its usually done in the best way > possibl

Re: how to find top N values using map-reduce ?

2013-02-01 Thread praveenesh kumar

Thanks for that Russell. Unfortunately I can't use Pig. Need to write my own MR job. I was wondering how its usually done in the best way possible. Regards Praveenesh On Sat, Feb 2, 2013 at 1:00 PM, Russell Jurney wrote: > Pig. Datafu. 7 lines of code. > > https://gist.github.com/4696443 > https

Re: how to find top N values using map-reduce ?

2013-02-01 Thread Russell Jurney

Pig. Datafu. 7 lines of code. https://gist.github.com/4696443 https://github.com/linkedin/datafu On Fri, Feb 1, 2013 at 11:17 PM, praveenesh kumar wrote: > Actually what I am trying to find to top n% of the whole data. > This n could be very large if my data is large. > > Assuming I have unifor

Re: how to find top N values using map-reduce ?

2013-02-01 Thread praveenesh kumar

Actually what I am trying to find to top n% of the whole data. This n could be very large if my data is large. Assuming I have uniform rows of equal size and if the total data size is 10 GB, using the above mentioned approach, if I have to take top 10% of the whole data set, I need 10% of 10GB whi

Re: how to find top N values using map-reduce ?

2013-02-01 Thread Eugene Kirpichov

Hi, Can you tell more about: * How big is N * How big is the input dataset * How many mappers you have * Do input splits correlate with the sorting criterion for top N? Depending on the answers, very different strategies will be optimal. On Fri, Feb 1, 2013 at 9:05 PM, praveenesh kumar wro

how to find top N values using map-reduce ?

2013-02-01 Thread praveenesh kumar

I am looking for a better solution for this. 1 way to do this would be to find top N values from each mappers and then find out the top N out of them in 1 reducer. I am afraid that this won't work effectively if my N is larger than number of values in my inputsplit (or mapper input). Otherway is

Re: how to find top N values using map-reduce ?

Re: how to find top N values using map-reduce ?

Re: how to find top N values using map-reduce ?

Re: how to find top N values using map-reduce ?

Re: how to find top N values using map-reduce ?

Re: how to find top N values using map-reduce ?

Re: how to find top N values using map-reduce ?

how to find top N values using map-reduce ?

8 matches

Site Navigation

Mail list logo

Footer information