Hi,

I am also very interested in this, great to see this patch comming

Would it be possible to apply it to a 1.2 version of nutch?

Is there some timeframe for when 1.3 will be released?

best regards,
Magnus

On Thu, Mar 10, 2011 at 2:53 PM, Markus Jelsma
<[email protected]> wrote:
> There's a patch for Nutch 1.3 that does the trick:
> https://issues.apache.org/jira/browse/NUTCH-963
>
> On Thursday 10 March 2011 15:48:13 Nemani, Raj wrote:
>> All,
>>
>>
>>
>> I use Nutch to crawl couple of internal websites and index the crawl
>> results into Solr.  Periodically Urls get removed from these websites
>> and I am noticing that the documents existing in the index corresponding
>> to these deleted Urls do not get cleaned up.
>>
>>
>>
>> My db.fetch.interval.default is set to 86400 seconds (24 hrs)
>>
>> The following is the command I use to index crawled documents to Solr
>>
>> $NUTCH_HOME/bin/nutch solrindex $solr_endpoint crawl/crawldb
>> crawl/linkdb crawl/segments/*
>>
>>
>>
>> Can you please tell me what I am doing wrong? Is Nutch/Solr indexing not
>> seeing the fact that there is a deleted Url that needs to be deleted
>> from the solr index?
>>
>>
>>
>> Thanks so much in advance
>>
>> Raj
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>

Reply via email to