Hi,

You cannot recover the mapper output as far as i know. But anyway, one should 
never have a fetcher running for three days. It's far better to generate a 
large amount of smaller segments and fetch them sequentially. If an error 
occurs, only a small portion is affected. We never run fetchers for more than 
one hour, instead we run many in a row and sometimes concurrently.

Cheers,

 
-----Original message-----
> From:Mohammad wrk <mhd...@yahoo.com>
> Sent: Fri 26-Oct-2012 00:47
> To: user@nutch.apache.org
> Subject: How to recover data from /tmp/hadoop-myuser
> 
> Hi,
> 
> 
> 
> My fetch cycle (nutch fetch ./segments/20121021205343/ -threads 25) failed, 
> after 3 days, with the error below. Under the segment folder 
> (./segments/20121021205343/) there is only generated fetch list 
> (crawl_generate) and no content. However /tmp/hadoop-myuser/ has 96G of data. 
> I was wondering if there is a way to recover this data and parse the segment?
> 
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any 
> valid local directory for output/file.out
> 
>         at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
>         at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
>         at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
>         at 
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
>         at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
>         at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 2012-10-24 14:43:29,671 ERROR fetcher.Fetcher - Fetcher: java.io.IOException: 
> Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
>         at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1318)
>         at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1354)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1327)
> 
> 
> Thanks,
> Mohammad

Reply via email to