All,
I use Nutch to crawl couple of internal websites and index the crawl results into Solr. Periodically Urls get removed from these websites and I am noticing that the documents existing in the index corresponding to these deleted Urls do not get cleaned up. My db.fetch.interval.default is set to 86400 seconds (24 hrs) The following is the command I use to index crawled documents to Solr $NUTCH_HOME/bin/nutch solrindex $solr_endpoint crawl/crawldb crawl/linkdb crawl/segments/* Can you please tell me what I am doing wrong? Is Nutch/Solr indexing not seeing the fact that there is a deleted Url that needs to be deleted from the solr index? Thanks so much in advance Raj

