re-crawling with nutch 1.8

Ali Nazemian Thu, 05 Jun 2014 12:26:11 -0700

Hi,
I recently got familiar with nutch and I want to use nutch for whole web
crawling. The problem is I did not find any useful tutorial on how to
re-crawl using nutch. I know that there is some configuration parameter
that should change for purpose of recrawling, I am aware of them. The thing
that I dont know is how can I run a crawler for crawl as first step and
recrawl as the next steps? As far as I found out the default crawl script
that is provided with nutch could not be used for my purpose. Could
somebody tell me how can I do that? What are the prerequisites? Do I need
web application server such as tomcat for this purpose?
FYI I am using nutch 1.8.


Regards.

-- 
A.Nazemian

re-crawling with nutch 1.8

Reply via email to