Re: How can I limit reducers to one-per-node?

2013-02-10 Thread Michael Segel
Adding a combiner step first then reduce? On Feb 8, 2013, at 11:18 PM, Harsh J ha...@cloudera.com wrote: Hey David, There's no readily available way to do this today (you may be interested in MAPREDUCE-199 though) but if your Job scheduler's not doing multiple-assignments on reduce

RE: How can I limit reducers to one-per-node?

2013-02-10 Thread David Parks
[mailto:michael_se...@hotmail.com] Sent: Monday, February 11, 2013 8:30 AM To: user@hadoop.apache.org Subject: Re: How can I limit reducers to one-per-node? Adding a combiner step first then reduce? On Feb 8, 2013, at 11:18 PM, Harsh J ha...@cloudera.com wrote: Hey David, There's

RE: How can I limit reducers to one-per-node?

2013-02-10 Thread David Parks
across hosts enough to be reasonably safe. From: Ted Dunning [mailto:tdunn...@maprtech.com] Sent: Monday, February 11, 2013 12:55 PM To: user@hadoop.apache.org Subject: Re: How can I limit reducers to one-per-node? For crawler type apps, typically you direct all of the URL's to crawl from

How can I limit reducers to one-per-node?

2013-02-08 Thread David Parks
I have a cluster of boxes with 3 reducers per node. I want to limit a particular job to only run 1 reducer per node. This job is network IO bound, gathering images from a set of webservers. My job has certain parameters set to meet web politeness standards (e.g. limit connects and

Re: How can I limit reducers to one-per-node?

2013-02-08 Thread Nan Zhu
I think set tasktracker.reduce.tasks.maximum to be 1 may meet your requirement Best, -- Nan Zhu School of Computer Science, McGill University On Friday, 8 February, 2013 at 10:54 PM, David Parks wrote: I have a cluster of boxes with 3 reducers per node. I want to limit a particular

Re: How can I limit reducers to one-per-node?

2013-02-08 Thread Nan Zhu
(using 15 m1.xlarge boxes which come with 3 reducer slots configured by default). From: Nan Zhu [mailto:zhunans...@gmail.com] Sent: Saturday, February 09, 2013 10:59 AM To: user@hadoop.apache.org (mailto:user@hadoop.apache.org) Subject: Re: How can I limit reducers to one-per

RE: How can I limit reducers to one-per-node?

2013-02-08 Thread David Parks
to a different node, but in the last run 3 nodes had nothing to do, and 3 other nodes had 2 reduce tasks assigned. From: Nan Zhu [mailto:zhunans...@gmail.com] Sent: Saturday, February 09, 2013 11:31 AM To: user@hadoop.apache.org Subject: Re: How can I limit reducers to one-per-node? I haven't

Re: How can I limit reducers to one-per-node?

2013-02-08 Thread Nan Zhu
] Sent: Saturday, February 09, 2013 11:31 AM To: user@hadoop.apache.org (mailto:user@hadoop.apache.org) Subject: Re: How can I limit reducers to one-per-node? I haven't use AWS MR beforeā€¦..if your instances are configured with 3 reducer slots, it means that 3 reducers can run at the same

Re: How can I limit reducers to one-per-node?

2013-02-08 Thread Harsh J
Hey David, There's no readily available way to do this today (you may be interested in MAPREDUCE-199 though) but if your Job scheduler's not doing multiple-assignments on reduce tasks, then only one is assigned per TT heartbeat, which gives you almost what you're looking for: 1 reduce task per