Hallo Dinçer,
> > > > Somewhere during the crawling process I get an error that stops > > > > everything: > > > > file:/home/cweiske/bin/apache-nutch-1.3/runtime/local/crawl-301/ > > > > segments/20110801090707 > > > > Exception in thread "main" > > > > org.apache.hadoop.mapred.InvalidInputException: Input path does > > > > not exist: > > > I have had same problem in one of my instances. Let's dig > > > together, at least. I have tried to re-crawl the url list into > > > same crawl directory (crawl-301 in your case) and got the same > > > error, will you confirm for your case? > URLs does not matter actually. Same URLs may do it. Just try to do the > crawling operation once more, just as in the first run. The thing is > I am not out of disk space (for esp. tmp) and I can sometimes get it > done without problems in this manner (yes I have some other problems > such redirection). I actually cannot reproduce the issue now. Very strange. -- Viele Grüße Dipl.-Inf. Christian Weiske Senior Developer Netresearch GmbH & Co. KG

