You can do a similar thing to your case #1 with Solr replication,
handling a lot of the details for you instead of you manually switching
cores and such. Index to a new core, then tell your production solr to
be a slave replicating from that master new core. It still may have some
of the same downsides as your scenario #1, it's essentially the same
thing, but with Solr replication taking care of the some of the nuts and
bolts for you.
I haven't hard of any better solutions. In general, Solr seems not
really so great at use cases where the index changes frequently in
response to user actions, it doesn't seem to really have been designed
that way.
You could store all your user-created data in an external store (rdbms
or no-sql), as well as indexing it, and then when you rebuild the index
you can get it all from there, so you won't lose any. It seems to often
work best, getting along with Solr's assumptions, to avoid considering
a Solr index ever the canonical storage location of any data -- Solr
isn't really designed to be storage, it's designed to be an index.
Always have the canonical storage location of any data being some actual
store, with Solr just being an index. That approach tends to make it
easier to work out things like this, although there can still be some
tricks. (Like, after you're done building your new index, but before you
replicate it to production, you might have to check the actual canonical
store for any data that changed in between the time you started your
re-index and now -- and then re-index that. And then any data that
changed between the time your second re-index began and... this could go
on forever. )
Robert Gründler wrote:
Hi again,
we're coming closer to the rollout of our newly created solr/lucene based
search, and i'm wondering
how people handle changes to their schema on live systems.
In our case, we have 3 cores (ie. A,B,C), where the largest one takes about 1.5
hours for a full dataimport from the relational
database. The Index is being updated in realtime, through post
insert/update/delete events in our ORM.
So far, i can only think of 2 scenarios for rebuilding the index, if we need to
update the schema after the rollout:
1. Create 3 more cores (A1,B1,C1) - Import the data from the database - After
importing, switch the application to cores A1, B1, C1
This will most likely cause a corrupt index, as in the 1.5 hours of indexing,
the database might get inserts/updates/deletes.
2. Put the Livesystem in a Read-Only mode and rebuild the index during that
time. This will ensure data integrity in the index, with the drawback for users
not being
able to write to the app.
Does Solr provide any built-in approaches to this problem?
best
-robert