[jira] Closed: (NUTCH-159) Specify temp/working directory for crawl

Andrzej Bialecki (JIRA) Thu, 17 Jan 2008 12:30:55 -0800

     [ 
https://issues.apache.org/jira/browse/NUTCH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andrzej Bialecki  closed NUTCH-159.
-----------------------------------

       Resolution: Won't Fix
    Fix Version/s: 0.8

> Specify temp/working directory for crawl
> ----------------------------------------
>
>                 Key: NUTCH-159
>                 URL: https://issues.apache.org/jira/browse/NUTCH-159
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher, indexer
>    Affects Versions: 0.8
>         Environment: Linux/Debian
>            Reporter: byron miller
>             Fix For: 0.8
>
>
> I ran a crawl of 100k web pages and got:
> org.apache.nutch.fs.FSError: java.io.IOException: No space left on device
>         at 
> org.apache.nutch.fs.LocalFileSystem$LocalNFSFileOutputStream.write(LocalFileSystem.java:149)
>         at org.apache.nutch.fs.FileUtil.copyContents(FileUtil.java:65)
>         at 
> org.apache.nutch.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:178)
>         at 
> org.apache.nutch.fs.NutchFileSystem.rename(NutchFileSystem.java:224)
>         at 
> org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:80)
> Caused by: java.io.IOException: No space left on device
>         at java.io.FileOutputStream.writeBytes(Native Method)
>         at java.io.FileOutputStream.write(FileOutputStream.java:260)
>         at 
> org.apache.nutch.fs.LocalFileSystem$LocalNFSFileOutputStream.write(LocalFileSystem.java:147)
>         ... 4 more
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
>         at org.apache.nutch.crawl.Fetcher.fetch(Fetcher.java:335)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:107)
> [EMAIL PROTECTED]:/data/nutch$ df -k
> It appears crawl created a /tmp/nutch directory that filled up even though i 
> specified a db directory.
> Need to add a parameter to the command line or make a globaly configurable 
> /tmp (work area) for the nutch instance so that crawls won't fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (NUTCH-159) Specify temp/working directory for crawl

Reply via email to