Hi, I am also very interested in this, great to see this patch comming
Would it be possible to apply it to a 1.2 version of nutch? Is there some timeframe for when 1.3 will be released? best regards, Magnus On Thu, Mar 10, 2011 at 2:53 PM, Markus Jelsma <[email protected]> wrote: > There's a patch for Nutch 1.3 that does the trick: > https://issues.apache.org/jira/browse/NUTCH-963 > > On Thursday 10 March 2011 15:48:13 Nemani, Raj wrote: >> All, >> >> >> >> I use Nutch to crawl couple of internal websites and index the crawl >> results into Solr. Periodically Urls get removed from these websites >> and I am noticing that the documents existing in the index corresponding >> to these deleted Urls do not get cleaned up. >> >> >> >> My db.fetch.interval.default is set to 86400 seconds (24 hrs) >> >> The following is the command I use to index crawled documents to Solr >> >> $NUTCH_HOME/bin/nutch solrindex $solr_endpoint crawl/crawldb >> crawl/linkdb crawl/segments/* >> >> >> >> Can you please tell me what I am doing wrong? Is Nutch/Solr indexing not >> seeing the fact that there is a deleted Url that needs to be deleted >> from the solr index? >> >> >> >> Thanks so much in advance >> >> Raj > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 >

