The suggestion to add a combiner is to help reduce the shuffle load (and perhaps, reduce # of reducers needed?), but it doesn't affect scheduling of a set number of reduce tasks nor does a scheduler care currently if you add that step in or not.
On Mon, Feb 11, 2013 at 7:59 AM, David Parks <[email protected]> wrote: > I guess the FairScheduler is doing multiple assignments per heartbeat, hence > the behavior of multiple reduce tasks per node even when they should > otherwise be full distributed. > > > > Adding a combiner will change this behavior? Could you explain more? > > > > Thanks! > > David > > > > > > From: Michael Segel [mailto:[email protected]] > Sent: Monday, February 11, 2013 8:30 AM > > > To: [email protected] > Subject: Re: How can I limit reducers to one-per-node? > > > > Adding a combiner step first then reduce? > > > > > > On Feb 8, 2013, at 11:18 PM, Harsh J <[email protected]> wrote: > > > > Hey David, > > There's no readily available way to do this today (you may be > interested in MAPREDUCE-199 though) but if your Job scheduler's not > doing multiple-assignments on reduce tasks, then only one is assigned > per TT heartbeat, which gives you almost what you're looking for: 1 > reduce task per node, round-robin'd (roughly). > > On Sat, Feb 9, 2013 at 9:24 AM, David Parks <[email protected]> wrote: > > I have a cluster of boxes with 3 reducers per node. I want to limit a > particular job to only run 1 reducer per node. > > > > This job is network IO bound, gathering images from a set of webservers. > > > > My job has certain parameters set to meet “web politeness” standards (e.g. > limit connects and connection frequency). > > > > If this job runs from multiple reducers on the same node, those per-host > limits will be violated. Also, this is a shared environment and I don’t > want long running network bound jobs uselessly taking up all reduce slots. > > > > > -- > Harsh J > > > > Michael Segel | (m) 312.755.9623 > > Segel and Associates > > -- Harsh J
