I use Solr + MySql with data coming from several DHI type "loaders" that I have written to move data from many different databases into my "BI" solution. I don't use DHI because I am not simply replicating the data, but I am moving/merging/processing the incoming data during the loading.
For me, I have an Aspect (aspectj) which wraps my Data Access Object and every time a "persist" is called (I am using hibernate), I update Solr with the same data an instant later using @Around advice. This handles nearly every event during the day. I have a simple "retry" procedure on my Solrj add/commit on network error in hopes that it will eventually work. In case of error I rebuild the solr index from scratch each night by recreating it based on the data in MySQL. That takes about 10 minutes and I run it at night. This allows for me to have "eventual consistency" for any issues that cropped up during the day. Obviously the size of my database (< 2 million records) makes this approach manageable. YMMV. Tim -----Original Message----- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Tuesday, March 15, 2011 9:13 AM To: solr-user@lucene.apache.org Subject: Re: keeping data consistent between Database and Solr On 3/14/2011 9:38 PM, onlinespend...@gmail.com wrote: > But my main question is, how do I guarantee that data between my Cassandra > database and Solr index are consistent and up-to-date? Our MySQL database has two unique indexes. One is a document ID, implemented in MySQL as an autoincrement integer and in Solr as a long. The other is what we call a tag id, implemented in MySQL as a varchar and Solr as a single lowercased token and serving as Solr's uniqueKey. We have an update trigger on the database that updates the document ID whenever the database document is updated. We have a homegrown build system for Solr. In a nutshell, it keeps track of the newest document ID in the Solr Index. If the DIH delta-import fails, it doesn't update the stored ID, which means that on the next run, it will try and index those documents again. Changes to the entries in the database are automatically picked up because the document ID is newer, but the tag id doesn't change, so the document in Solr is overwritten. Things are actually more complex than I've written, because our index is distributed. Hopefully it can give you some ideas for yours. Shawn