Shaun:

You should try NRT available with Solr with RankingAlgorithm here. You should be able to add docs in real time and also query them in real time. If DIH does not retain the old index, you may be able to convert the rss fields to a XML format as needed by Solr and update the docs (make sure there is a unique id)

http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_3.x

You can download Solr 3.4.0 with RankingAlgorithm 1.3 from here:
http://solr-ra.tgels.org

Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org

On 11/6/2011 1:22 PM, Shaun Barriball wrote:
Hi all,

We've successfully setup Solr 3.4.0 to parse and import multiple news RSS feeds 
(based on the slashdot example on 
http://wiki.apache.org/solr/DataImportHandler) using the HttpDataSource.
The objective is for Solr to index ALL news items published on this feed (ever) - not just the current contents of the feed. I've read that the delta import is not supported for XML imports. I've therefore tried to use "command=full-impor&clean=false".
But still the number of Documents Processed seems to be stuck at a fixed number 
of items looking at the Stats and the 'numFound' result for a generic '*:*' 
search. New items are being added to the feeds all the time (and old ones 
dropping off).

Is it possible for Solr to incrementally build an index of a live RSS feed 
which is changing but retain the index of its archive?

All help appreciated.
Shaun

Reply via email to