I would like to be able to do a delta import on arbitrary data, not a
last modified date. Specifically, our database has an auto_increment
field called DID, or document identifier. For changes to existing data.
this field is updated anytime a row is changed in any way, effectively
turning it into a new document. On the indexing side, we delete the old
document and insert the new one.
We are currently using a pricy commercial indexing product (which we
know is based on Lucene) and are in the process of developing a
replacement with distributed SOLR. The dividing line between indexed
and new data is the highest DID in the existing data set, which we track
and only update when new data is successfully indexed.
If there's a better way to do this already (multiple cores and index
merging?), I'm all ears. We are not very far along, so we have a couple
of weeks left to define our approach.
Thanks,
Shawn
On 2/24/2010 6:42 AM, Grant Ingersoll wrote:
What would it be?