How to select random n records using mapreduce ?

2011-06-27 Thread Jeff Zhang
Hi all, I'd like to select random N records from a large amount of data using hadoop, just wonder how can I archive this ? Currently my idea is that let each mapper task select N / mapper_number records. Does anyone has such experience ? -- Best Regards Jeff Zhang

RE: How to select random n records using mapreduce ?

2011-06-27 Thread Habermaas, William
, June 27, 2011 3:29 PM To: mapreduce-u...@hadoop.apache.org Cc: core-u...@hadoop.apache.org Subject: Re: How to select random n records using mapreduce ? The only solution I can think of is by creating a counter in Hadoop that is incremented each time a mapper lets a record through. As soon

RE: How to select random n records using mapreduce ?

2011-06-27 Thread Jeff.Schmitz
Wait - Habermaas like in Critical Theory -Original Message- From: Habermaas, William [mailto:william.haberm...@fatwire.com] Sent: Monday, June 27, 2011 2:55 PM To: common-user@hadoop.apache.org Subject: RE: How to select random n records using mapreduce ? I did something similar

Re: How to select random n records using mapreduce ?

2011-06-27 Thread Matt Pouttu-Clarke
, 2011 3:29 PM To: mapreduce-u...@hadoop.apache.org Cc: core-u...@hadoop.apache.org Subject: Re: How to select random n records using mapreduce ? The only solution I can think of is by creating a counter in Hadoop that is incremented each time a mapper lets a record through. As soon as the value