Hi everybody, I am going to use nutch for crawling some news web site. These websites will be updated regularly. Therefore I should recrawl them at least every 2 hours. But the problem is I want to have incremental re-crawl, it means nutch should crawl only the urls that are new and not fetched before (except the main page of each site for extracting new urls). I want in each re-crawling process only the new URLs fetched and send to solr for indexing. Would somebody guide me through this scenario with nutch 1.8? Best regards.
-- A.Nazemian

