I cannot confirm this when parsing a local 404 page. What do you get when fetching that page with:
bin/nutch org.apache.nutch.parse.ParserChecker http://wiki.example.org/INTERN_WIKI:Impressum you should get a nice 404 On Monday 01 August 2011 08:41:07 Christian Weiske wrote: > Hello, > > > I'm using the official nutch 1.3 distribution to crawl our internal > mediawiki instance. Whenever a 404 is encountered, I get a > > > fetch of http://wiki.example.org/INTERN_WIKI:Impressum failed > > with: java.net.SocketTimeoutException: Read timed out > > The page really does not exist: > > $ curl -I http://wiki.example.org/INTERN_WIKI:Impressum > > HTTP/1.1 404 Not Found > > So I think the error message is misleading. Is that a bug? -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

