You can do a similar thing to your case #1 with Solr replication, handling a lot of the details for you instead of you manually switching cores and such. Index to a new core, then tell your production solr to be a slave replicating from that master new core. It still may have some of the same downsides as your scenario #1, it's essentially the same thing, but with Solr replication taking care of the some of the nuts and bolts for you.

I haven't hard of any better solutions. In general, Solr seems not really so great at use cases where the index changes frequently in response to user actions, it doesn't seem to really have been designed that way.

You could store all your user-created data in an external store (rdbms or no-sql), as well as indexing it, and then when you rebuild the index you can get it all from there, so you won't lose any. It seems to often work best, getting along with Solr's assumptions, to avoid considering a Solr index ever the canonical storage location of any data -- Solr isn't really designed to be storage, it's designed to be an index. Always have the canonical storage location of any data being some actual store, with Solr just being an index. That approach tends to make it easier to work out things like this, although there can still be some tricks. (Like, after you're done building your new index, but before you replicate it to production, you might have to check the actual canonical store for any data that changed in between the time you started your re-index and now -- and then re-index that. And then any data that changed between the time your second re-index began and... this could go on forever. )

Robert Gründler wrote:
Hi again,

we're coming closer to the rollout of our newly created solr/lucene based 
search, and i'm wondering
how people handle changes to their schema on live systems.
In our case, we have 3 cores (ie. A,B,C), where the largest one takes about 1.5 
hours for a full dataimport from the relational
database. The Index is being updated in realtime, through post 
insert/update/delete events in our ORM.

So far, i can only think of 2 scenarios for rebuilding the index, if we need to 
update the schema after the rollout:

1. Create 3 more cores (A1,B1,C1) - Import the data from the database - After 
importing, switch the application to cores A1, B1, C1

This will most likely cause a corrupt index, as in the 1.5 hours of indexing, 
the database might get inserts/updates/deletes.

2. Put the Livesystem in a Read-Only mode and rebuild the index during that 
time. This will ensure data integrity in the index, with the drawback for users 
not being
able to write to the app.

Does Solr provide any built-in approaches to this problem?


best

-robert




Reply via email to