I m using nutch to crawl local directory on my system.I have modified all
the conf files like default.xml,crawl-urlfilter etc.
I have also modified HttpResponse.java
but it is skipping all the URLS.please help.
>
> I want use nutch0.9 for Whole-web Crawling but the nutch0.9 not the admin
> commad to create a crawldb and I just execute nutch the display the
> commandline not about how to create a crawldb
> And I can't find any tutorial for nutch0.9 so I help somebody to tell me
> how
> to create a crawldb
: org.apache.nutch.protocol.ProtocolNotFound: protocol not
found for url=file
fetching file:///root/Desktop/csiro-split/CSIRO002
MOHIT GOYAL
CSE
200502013