Thanks Alex.
I found a JIRA that relates to my question
https://issues.apache.org/jira/browse/HADOOP-3420
If I decide to do something about this, I'll follow up with HADOOP-3420.
Thanks,
DougB
On Oct 28, 2008, at 5:49 PM, Alex Loddengaard wrote:
I understand your question now, Doug; thanks fo
Alex Loddengaard wrote:
That's the best I can do I think. Can others chime in?
Another complicating factor is that, if a node dies, reduce tasks can be
stalled waiting for map data to be re-generated. So if all tasks were
scheduled out of a single pool, one would need to be careful to never
I understand your question now, Doug; thanks for clarifying. However, I
don't think I can give you a great answer. I'll give it a shot, though:
It does seem like having a single task configuration in theory would improve
utilization, but it might also make things worse. For example, generally
sp
Hi Alex, I'm sorry, I think you misunderstood my question. Let me
explain some more.
I have a hadoop cluster of dual quad core machines.
I'm using hadoop-0.18.1 with Matei's fairscheduler patch
https://issues.apache.org/jira/browse/HADOOP-3746 running in FIFO mode.
I have about 5 different jobs
In most jobs, map and reduce tasks are significantly differently, and their
runtimes vary as well. The number of reducers also determines how many
output files you have. So in the case when you would want one output file,
having a single generic task limit would mean that you'd also have one
mapp
Hi,
I've been wondering why there are separate task limits for map and
reduce.
Why not a single generic task limit per node ?
Thanks for any insight,
Doug