Hi Lewis, Can you try the patch attached over here: https://issues.apache.org/jira/browse/NUTCH-1483
Thanks, Tejas On Tue, Aug 6, 2013 at 7:24 PM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > Hi, > Now using Nutch trunk 1.8-SNAPSHOT HEAD > Back at this tonight. When attempting to fetch > > file://home/law/Downloads/asf/solr-4.3.1/example/e001 (notice two slashes) > > which contains loads of HTML files, I get the error as below. > > > Fetcher: throughput threshold retries: 5 > -finishing thread FetcherThread, activeThreads=1 > org.apache.nutch.protocol.file.FileError: File Error: 404 > at org.apache.nutch.protocol.file.File.getProtocolOutput(File.java:118) > at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:703) > fetch of file://home/law/Downloads/asf/solr-4.3.1/example/e001 failed with: > org.apache.nutch.protocol.file.FileError: File Error: 404 > -finishing thread FetcherThread, activeThreads=0 > -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 > -activeThreads=0 > Fetcher: finished at 2013-08-06 18:59:00, elapsed: 00:00:02 > > I then deleted the crawldb changed the seed URL to > > file:/home/law/Downloads/asf/solr-4.3.1/example/e001 (notice one slash) > > But when I eventually get fetching after a few rounds of generate, fetch, > parse, updatedb, I am landed with > > fetching file:/home/law/Downloads/asf/solr-4.3.1/example/5428_03.html > (queue crawl delay=500ms) > org.apache.nutch.protocol.file.FileError: File Error: 404 > at org.apache.nutch.protocol.file.File.getProtocolOutput(File.java:118) > at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:703) > fetch of file:/home/law/Downloads/asf/solr-4.3.1/example/5428_03.html > failed with: org.apache.nutch.protocol.file.FileError: File Error: 404 > fetching file:/home/law/Downloads/asf/solr-4.3.1/example/5094_08.html > (queue crawl delay=500ms) > org.apache.nutch.protocol.file.FileError: File Error: 404 > at org.apache.nutch.protocol.file.File.getProtocolOutput(File.java:118) > at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:703) > fetch of file:/home/law/Downloads/asf/solr-4.3.1/example/5094_08.html > failed with: org.apache.nutch.protocol.file.FileError: File Error: 404 > > Same as before... this happens with every single URL in the directory I am > trying to crawl. > > Any advice here please? > Thanks > Lewis > > -- > *Lewis* >