I would like to be able to do a delta import on arbitrary data, not a last modified date. Specifically, our database has an auto_increment field called DID, or document identifier. For changes to existing data. this field is updated anytime a row is changed in any way, effectively turning it into a new document. On the indexing side, we delete the old document and insert the new one.

We are currently using a pricy commercial indexing product (which we know is based on Lucene) and are in the process of developing a replacement with distributed SOLR. The dividing line between indexed and new data is the highest DID in the existing data set, which we track and only update when new data is successfully indexed.

If there's a better way to do this already (multiple cores and index merging?), I'm all ears. We are not very far along, so we have a couple of weeks left to define our approach.

Thanks,
Shawn

On 2/24/2010 6:42 AM, Grant Ingersoll wrote:
What would it be?

Reply via email to