Re: Why separate Map/Reduce task limits per node ?

2008-10-28 Thread Doug Balog
Thanks Alex. I found a JIRA that relates to my question https://issues.apache.org/jira/browse/HADOOP-3420 If I decide to do something about this, I'll follow up with HADOOP-3420. Thanks, DougB On Oct 28, 2008, at 5:49 PM, Alex Loddengaard wrote: I understand your question now, Doug; thanks fo

Re: Why separate Map/Reduce task limits per node ?

2008-10-28 Thread Doug Cutting
Alex Loddengaard wrote: That's the best I can do I think. Can others chime in? Another complicating factor is that, if a node dies, reduce tasks can be stalled waiting for map data to be re-generated. So if all tasks were scheduled out of a single pool, one would need to be careful to never

Re: Why separate Map/Reduce task limits per node ?

2008-10-28 Thread Alex Loddengaard
I understand your question now, Doug; thanks for clarifying. However, I don't think I can give you a great answer. I'll give it a shot, though: It does seem like having a single task configuration in theory would improve utilization, but it might also make things worse. For example, generally sp

Re: Why separate Map/Reduce task limits per node ?

2008-10-28 Thread Doug Balog
Hi Alex, I'm sorry, I think you misunderstood my question. Let me explain some more. I have a hadoop cluster of dual quad core machines. I'm using hadoop-0.18.1 with Matei's fairscheduler patch https://issues.apache.org/jira/browse/HADOOP-3746 running in FIFO mode. I have about 5 different jobs

Re: Why separate Map/Reduce task limits per node ?

2008-10-27 Thread Alex Loddengaard
In most jobs, map and reduce tasks are significantly differently, and their runtimes vary as well. The number of reducers also determines how many output files you have. So in the case when you would want one output file, having a single generic task limit would mean that you'd also have one mapp

Why separate Map/Reduce task limits per node ?

2008-10-27 Thread Doug Balog
Hi, I've been wondering why there are separate task limits for map and reduce. Why not a single generic task limit per node ? Thanks for any insight, Doug