Hallo Dinçer,


> > > > Somewhere during the crawling process I get an error that stops
> > > > everything:
> > > > file:/home/cweiske/bin/apache-nutch-1.3/runtime/local/crawl-301/
> > > > segments/20110801090707
> > > > Exception in thread "main"
> > > > org.apache.hadoop.mapred.InvalidInputException: Input path does
> > > > not exist:
> > > I have had same problem in one of my instances. Let's dig
> > > together, at least. I have tried to re-crawl the url list into
> > > same crawl directory (crawl-301 in your case) and got the same
> > > error, will you confirm for your case?
> URLs does not matter actually. Same URLs may do it. Just try to do the
> crawling operation once more, just as in the first run. The thing is
> I am not out of disk space (for esp. tmp) and I can sometimes get it
> done without problems in this manner (yes I have some other problems
> such redirection).

I actually cannot reproduce the issue now. Very strange.

-- 
Viele Grüße
Dipl.-Inf. Christian Weiske

Senior Developer
Netresearch GmbH & Co. KG

Reply via email to