Hello Abdul,

Nutch will, by default not recrawl until some interval is passed. Check,

<!-- web db properties -->
<property>
  <name>db.fetch.interval.default</name>
  <value>2592000</value>
  <description>The default number of seconds between re-fetches of a page (30 
days).
  </description>
</property>

Markus

 
 
-----Original message-----
> From:Abdul Munim <mu...@outlook.com>
> Sent: Sunday 19th June 2016 21:34
> To: user@nutch.apache.org
> Subject: Reindex Nutch periodically using cron job
> 
> Hi,
> 
> I've crawled a website using Nutch 1.12 and indexed it in Solr 6.1 using the 
> below command:
> 
> ==CODE==
> [root@2a563cff0511 nutch-latest]# bin/crawl -i \
> > -D solr.server.url=http://192.168.99.100:8983/solr/test/ urls/ crawl 5
> ==END_CODE==
> 
> When I run the above command again then it says the following:
> 
> ==CODE==
> [root@2a563cff0511 nutch-latest]# bin/crawl -i \
> > -D solr.server.url=http://192.168.99.100:8983/solr/test/ urls/ crawl 5
> Injecting seed URLs
> /opt/nutch-latest/bin/nutch inject crawl/crawldb urls/
> Injector: starting at 2016-06-19 15:29:08
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: overwrite: false
> Injector: update: false
> Injector: Total urls rejected by filters: 0
> Injector: Total urls injected after normalization and filtering: 1
> Injector: Total urls injected but already in CrawlDb: 1
> Injector: Total new urls injected: 0
> Injector: finished at 2016-06-19 15:29:13, elapsed: 00:00:05
> Sun Jun 19 15:29:13 UTC 2016 : Iteration 1 of 1
> Generating a new segment
> /opt/nutch-latest/bin/nutch generate -D mapreduce.job.reduces=2 -D 
> mapred.child.java.opts=-Xmx1000m -D mapreduce.reduce.
> speculative=false -D mapreduce.map.speculative=false -D 
> mapreduce.map.output.compress=true crawl/crawldb crawl/segments
> -topN 50000 -numFetchers 1 -noFilter
> Generator: starting at 2016-06-19 15:29:15
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: false
> Generator: normalizing: true
> Generator: topN: 50000
> Generator: 0 records selected for fetching, exiting ...
> Generate returned 1 (no new segments created)
> Escaping loop: no more URLs to fetch now
> ==END_CODE==
> 
> However, I made some changes i.e. new file is being added and an existing 
> file has been changed.
> 
> Regards,
> Munim
> 

Reply via email to