Re: Best practices to rebuild index on live system

Jonathan Rochkind Thu, 11 Nov 2010 15:57:48 -0800

You can do a similar thing to your case #1 with Solr replication,handling a lot of the details for you instead of you manually switchingcores and such. Index to a new core, then tell your production solr tobe a slave replicating from that master new core. It still may have someof the same downsides as your scenario #1, it's essentially the samething, but with Solr replication taking care of the some of the nuts andbolts for you.

I haven't hard of any better solutions. In general, Solr seems notreally so great at use cases where the index changes frequently inresponse to user actions, it doesn't seem to really have been designedthat way.

You could store all your user-created data in an external store (rdbmsor no-sql), as well as indexing it, and then when you rebuild the indexyou can get it all from there, so you won't lose any. It seems to oftenwork best, getting along with Solr's assumptions, to avoid consideringa Solr index ever the canonical storage location of any data -- Solrisn't really designed to be storage, it's designed to be an index.Always have the canonical storage location of any data being some actualstore, with Solr just being an index. That approach tends to make iteasier to work out things like this, although there can still be sometricks. (Like, after you're done building your new index, but before youreplicate it to production, you might have to check the actual canonicalstore for any data that changed in between the time you started yourre-index and now -- and then re-index that. And then any data thatchanged between the time your second re-index began and... this could goon forever. )


Robert Gründler wrote:

Hi again,

we're coming closer to the rollout of our newly created solr/lucene based 
search, and i'm wondering

how people handle changes to their schema on live systems.

In our case, we have 3 cores (ie. A,B,C), where the largest one takes about 1.5 
hours for a full dataimport from the relational
database. The Index is being updated in realtime, through post 
insert/update/delete events in our ORM.

So far, i can only think of 2 scenarios for rebuilding the index, if we need to 
update the schema after the rollout:

1. Create 3 more cores (A1,B1,C1) - Import the data from the database - After 
importing, switch the application to cores A1, B1, C1

This will most likely cause a corrupt index, as in the 1.5 hours of indexing, 
the database might get inserts/updates/deletes.

2. Put the Livesystem in a Read-Only mode and rebuild the index during that 
time. This will ensure data integrity in the index, with the drawback for users 
not being
able to write to the app.

Does Solr provide any built-in approaches to this problem?


best

-robert

Re: Best practices to rebuild index on live system

Reply via email to