Hi, For a few weeks now, we experience a rather annoying problem with a Nutch/Hadoop installation.
It's a very simple setup: the Hadoop configuration is the default from Nutch. The version of Hadoop is the hadoop-0.17.1 jar provided by Nutch. During the injection operation, we now have the following errors in one of the MapReduce task: 2009-01-21 00:00:03,344 WARN fs.AllocatorPerContext - org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /tmp/hadoop-moteur/mapred/local/taskTracker/jobcache/job_local_1/reduce_3fm7iw/output at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:73) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:253) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:298) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:159) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170) 2009-01-21 00:00:03,347 WARN mapred.LocalJobRunner - job_local_1 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/jobcache/job_local_1/reduce_3fm7iw/output/map_0.out at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:313) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:159) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170) 2009-01-21 00:00:04,211 FATAL crawl.Injector - Injector: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062) at org.apache.nutch.crawl.Injector.inject(Injector.java:160) at org.apache.nutch.crawl.Injector.run(Injector.java:190) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Injector.main(Injector.java:180) The fact is that the directory "reduce_3fm7iw" cannot be created because the job_local_1/ directory already contains 32k directories and we are limited by the filesystem. Is there any way to configure Hadoop to limit the number of directories created (even if it's slower) or any other solution for this problem? I wonder if setting dfs.max.objects is a solution to my problem but I'm not sure of the consequences it might have. If additional information are necessary, I'll be glad to provide them. Thanks. -- Guillaume