On 2/11/2014 2:37 PM, shamik wrote:
Eric,

   Thanks for your reply. I should have given a better context. I'm currently
running an incremental crawl daily on this particular source and indexing
the documents. Incremental crawl looks for any change since last crawl date
based on the document publish date. But, there's no way for me to know if a
document has been deleted. To ensure that, I ran a full crawl on a weekend,
which basically re-index the entire content. After the full index is over, I
call a purge script, which deletes any content which is more than 24 hour
old, based on the indextimestamp field.

The issue with atomic update is that it doesn't alter the indextimstamp
field. So even if I run a full crawl with atomic updates, the timestamp will
stick to its old value. Unfortunately, I can't rely on another date field
coming from the source as they are not consistent. That translates to the
fact that I can't remove stale content.

One possibility is this: When you send the atomic update to Solr, include a new value for the indextimestamp field.

Another option: You can write a custom update processor plugin for Solr. When the custom code is used, it will be executed on each incoming document. Depending on what it finds in the update request, it can make appropriate changes, like updating indextimestamp. You can do pretty much anything.

http://wiki.apache.org/solr/UpdateRequestProcessor

Writing an update processor in Java typically gives the best results in terms of flexibility and performance, but there is also a way to use other programming languages:

http://wiki.apache.org/solr/ScriptUpdateProcessor

Thanks,
Shawn

Reply via email to