Hi Bryan, On Thu, 2006-02-02 at 12:06, Bryan A. Pendleton wrote: > > 1) If you fill up the space of a datanode, it appears to fail with the wrong > exception and reload. This, combined with the currently simple > block-allocation method (random), means that one "full" node can cause a big > dropoff in NDFS write performance, as clients end up timing out some percent > of the time when asked to talk to the "full" node, while the full node is > busy reloading. >
The existing code chooses a node probabilistically, with more weight given to machines with more space available. I should probably change this so that machines under a certain limit are just never chosen. I filed bug HADOOP-26 for this. Are you running on nodes with varying amounts of disk space on each? I've been doing testing on relatively clean sets of machines, so I'm probably hitting this problem less often than I "should" be. > 2) Running out of disk space also kills the task tracker. This has an > especially devilling side effect - if you've finished all map tasks, and, > during reduce, a task tracker runs out of space, it disappears, taking all > of its completed maps with it. As far as I can tell, once mapping has > completed once, lost map jobs never get rescheduled. This, in effect, means > that running out of space on any task tracker node any time after map has > completed prevents the job from finishing. If you run out of space, maybe > you should stop taking tasks, and error out the *current* task, but still be > available to distributed completed map results to the reduce tasks, no? > Likewise, when map tasks are lost, shouldn't they be getting rescheduled on > still-available nodes? > I'm surprised by this; a reduce task that fails should be rescheduled elsewhere. If a map is lost, it too should be rescheduled. We made a few major changes to MapReduce in the last week. When did you last do a bringover from svn? But I agree that a node without space should no longer offer task service to the system. I filed HADOOP-27 for this one. > 3) It should be possible to improve the problems of 2, even given severe > space restrictions. Why can't a node be queried for how much space it has > available before accepting a reduce task? It should be straightforward to > figure out exactly how much space is needed to complete the reduce - 2x the > sum of the appropriate partitions (room for the appended version, and room > for a sorted output of that), right? If space is low, you don't allocate a > reduce to a tasktracker that doesn't have enough room. Likewise, once a > reduce job has entirely completed, the space from the original map/partition > files it reduced can be freed. Thus, so long as at least one node in the > cluster has room for at least the smallest reduce job, it should be possible > to make progress, rather than failing out. > It's not easy to predict the size of data emitted by either map or reduce. The user can always emit insane strings of arbitrary length. I think the right solution is to allow the administrator to set a "minimum size" parameter. Indeed, I believe this param is still in place from earlier DFS work. While this might be too conservative in many cases, an almost-full disk should be a relatively rare edge case that shouldn't be optimized for. > 4) Finally, performance-wise, why is it that we don't mimic the Google > Mapreduce technique of starting duplicates of tasks when the overall job > starts to near completion? I have a few slow machines in my cluster, which > can usefully complete work on large runs, but are, unsurprisingly, extending > the average completion time of my runs, somewhat needlessly. > This "speculative execution" is exactly what happens in the latest version of MapReduce. Thanks for the bug tips, --Mike ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers