Hello Abdul, Nutch will, by default not recrawl until some interval is passed. Check,
<!-- web db properties --> <property> <name>db.fetch.interval.default</name> <value>2592000</value> <description>The default number of seconds between re-fetches of a page (30 days). </description> </property> Markus -----Original message----- > From:Abdul Munim <mu...@outlook.com> > Sent: Sunday 19th June 2016 21:34 > To: user@nutch.apache.org > Subject: Reindex Nutch periodically using cron job > > Hi, > > I've crawled a website using Nutch 1.12 and indexed it in Solr 6.1 using the > below command: > > ==CODE== > [root@2a563cff0511 nutch-latest]# bin/crawl -i \ > > -D solr.server.url=http://192.168.99.100:8983/solr/test/ urls/ crawl 5 > ==END_CODE== > > When I run the above command again then it says the following: > > ==CODE== > [root@2a563cff0511 nutch-latest]# bin/crawl -i \ > > -D solr.server.url=http://192.168.99.100:8983/solr/test/ urls/ crawl 5 > Injecting seed URLs > /opt/nutch-latest/bin/nutch inject crawl/crawldb urls/ > Injector: starting at 2016-06-19 15:29:08 > Injector: crawlDb: crawl/crawldb > Injector: urlDir: urls > Injector: Converting injected urls to crawl db entries. > Injector: overwrite: false > Injector: update: false > Injector: Total urls rejected by filters: 0 > Injector: Total urls injected after normalization and filtering: 1 > Injector: Total urls injected but already in CrawlDb: 1 > Injector: Total new urls injected: 0 > Injector: finished at 2016-06-19 15:29:13, elapsed: 00:00:05 > Sun Jun 19 15:29:13 UTC 2016 : Iteration 1 of 1 > Generating a new segment > /opt/nutch-latest/bin/nutch generate -D mapreduce.job.reduces=2 -D > mapred.child.java.opts=-Xmx1000m -D mapreduce.reduce. > speculative=false -D mapreduce.map.speculative=false -D > mapreduce.map.output.compress=true crawl/crawldb crawl/segments > -topN 50000 -numFetchers 1 -noFilter > Generator: starting at 2016-06-19 15:29:15 > Generator: Selecting best-scoring urls due for fetch. > Generator: filtering: false > Generator: normalizing: true > Generator: topN: 50000 > Generator: 0 records selected for fetching, exiting ... > Generate returned 1 (no new segments created) > Escaping loop: no more URLs to fetch now > ==END_CODE== > > However, I made some changes i.e. new file is being added and an existing > file has been changed. > > Regards, > Munim >