Re: v0.20.203: Do we need to set the number of reducers everytime?

2012-04-06 Thread Harsh J
Piyush, On Sat, Apr 7, 2012 at 3:02 AM, Piyush Kansal wrote: > So, do I need to set the number of reducers depending on data size, every > time I want to run MapReduce. Or the Hadoop framework automatically invokes > the required number of reducers at run time? Plain Apache Hadoop does

v0.20.203: Do we need to set the number of reducers everytime?

2012-04-06 Thread Piyush Kansal
Hi, I am using Partitioner and Grouper class in my program. And lets say the data which I want to process using MapReduce varies in size, it can be just 10MB or can go up to 10GB. So, do I need to set the number of reducers depending on data size, every time I want to run MapReduce. Or the Hadoop

Re: Is there a way to insure that different jobs have the same number of reducers

2011-06-29 Thread Trevor Adams
ead the output of the first job to > run through the values of A. > One issue is that assuming the same hashing partitioner is used there are > the same number of reducers, a specific reducer , say reducer 12 , > will receive the same keys in both jobs and thus part-r-00012 from the >

Is there a way to insure that different jobs have the same number of reducers

2011-06-29 Thread Steve Lewis
the same hashing partitioner is used there are the same number of reducers, a specific reducer , say reducer 12 , will receive the same keys in both jobs and thus part-r-00012 from the first job is the only file reducer 12 will need to read. Can I guarantee (without restricting the number of

Re: Number of Reducers Set to One

2011-05-13 Thread Robert Evans
You could merge the side effect files before running the second map job if you want to, or you could just leave them as separate files and then read each one in the mapper. If there are lots of files then the namenode may get hit too much, and slow down the entire cluster. This is mitigated by

Re: Number of Reducers Set to One

2011-05-12 Thread Geoffry Roberts
Bobby, Thanks for such a thoughtful response. I have a data set that represents all the people that pass through Las Vegas over a course of time, say five years, which comes to about 175 - 200 million people. Each record is a person, and it contains fields for where they came from, left to; tim

Re: Number of Reducers Set to One

2011-05-12 Thread Robert Evans
Geoffry, That really depends on how much data you are processing, and the algorithm you need to use to process the data. I did something similar a while ago with a medium amount of data and we saw significant speed up by first assigning each record a new key based off of the expected range of

Number of Reducers Set to One

2011-05-12 Thread Geoffry Roberts
All, I am mostly seeking confirmation as to my thinking on this matter. I have an MR job that I believe will force me into using a single reducer. The nature of the process is one where calculations performed on a given record rely on certain accumulated values whose calculation depends on rollin

Hadoop scheduler and number of reducers config

2011-04-13 Thread Hrishikesh Gadre
Hello All, I have a question regarding configuring the number of reducers property in case of a non FIFO scheduler (either Capacity/Fair-share scheduler). As per the guidelines on the Hadoop wiki page, we should set number of reducers = 0.75 * maximum_reduce_slots_available_in_cluster (minimum

Re: number of reducers

2010-06-06 Thread Torsten Curdt
Great. Just thought I'd be missing something :) On Mon, Jun 7, 2010 at 01:55, Aaron Kimball wrote: > Yes. LJR sets your number of reduce tasks to 1 if the number is >= 1. > Subnote: I've posted a patch to fix this at MAPREDUCE-434, but it's not > committed. > - Aaron > > On Mon, Jun 7, 2010 at 1:

Re: number of reducers

2010-06-06 Thread Aaron Kimball
Yes. LJR sets your number of reduce tasks to 1 if the number is >= 1. Subnote: I've posted a patch to fix this at MAPREDUCE-434, but it's not committed. - Aaron On Mon, Jun 7, 2010 at 1:42 AM, Torsten Curdt wrote: > I see only one. > > Could it be that using the LocalJobRunner interferes here?

Re: number of reducers

2010-06-06 Thread Torsten Curdt
I see only one. Could it be that using the LocalJobRunner interferes here? On Mon, Jun 7, 2010 at 01:31, Eric Sammer wrote: > Torsten: > > To clarify, how many reducers do you actually see? (i.e. Do you see 4 > reducers or 1?) It should work as you expect. > > On Sun, Jun 6, 2010 at 1:33 PM, Tor

Re: number of reducers

2010-06-06 Thread Eric Sammer
Torsten: To clarify, how many reducers do you actually see? (i.e. Do you see 4 reducers or 1?) It should work as you expect. On Sun, Jun 6, 2010 at 1:33 PM, Torsten Curdt wrote: > When I set > >  job.setPartitionerClass(MyPartitioner.class); >  job.setNumReduceTasks(4); > > I would expect to see

number of reducers

2010-06-06 Thread Torsten Curdt
When I set job.setPartitionerClass(MyPartitioner.class); job.setNumReduceTasks(4); I would expect to see my MyParitioner get called with getPartition(key, value, 4) but still I see it only get called with 1. If also tried setting conf.set("mapred.map.tasks.speculative.exe