Shaun:
You should try NRT available with Solr with RankingAlgorithm here. You
should be able to add docs in real time and also query them in real
time. If DIH does not retain the old index, you may be able to convert
the rss fields to a XML format as needed by Solr and update the docs
(make sure there is a unique id)
http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_3.x
You can download Solr 3.4.0 with RankingAlgorithm 1.3 from here:
http://solr-ra.tgels.org
Regards,
- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org
On 11/6/2011 1:22 PM, Shaun Barriball wrote:
Hi all,
We've successfully setup Solr 3.4.0 to parse and import multiple news RSS feeds
(based on the slashdot example on
http://wiki.apache.org/solr/DataImportHandler) using the HttpDataSource.
The objective is for Solr to index ALL news items published on this feed (ever) - not just the current contents of the feed. I've read that the delta import is not supported for XML imports. I've therefore tried to use "command=full-impor&clean=false".
But still the number of Documents Processed seems to be stuck at a fixed number
of items looking at the Stats and the 'numFound' result for a generic '*:*'
search. New items are being added to the feeds all the time (and old ones
dropping off).
Is it possible for Solr to incrementally build an index of a live RSS feed
which is changing but retain the index of its archive?
All help appreciated.
Shaun