I think the key question here is what's the best way to perform indexing
without affecting search performance, or without affecting it much. If
you have a batch of documents to index (say a daily batch that takes an
hour to index and merge), you'd like to do that on an offline system,
and then when ready, bring that index up for searching. but using
Lucene's multiple commit points assumes you use the same box for search
and indexing doesn't it?
Something like this is what I have in mind (simple 2-server config here):
Box 1 is live and searching
Box 2 is offline and ready to index
loading begins on Box 2...
loading complete on Box 2 ...
commit, optimize
Swap Box 1 and Box 2 ( with a load balancer or application config?)
Box 2 is live and searching
Box 1 is offline and ready to index
To make the best use of your resources, you'd then like to start using
Box 1 for searching (until indexing starts up again). Perhaps if your
load balancing is clever enough, it could be sensitive to the decreased
performance of the indexing box and just send more requests to the other
one(s). That's probably ideal.
-Mike S
Under the hood, Lucene can support this by keeping multiple commit
points in the index.
So you'd make a new commit whenever you finish indexing the updates
from each hour, and record that this is the last "searchable" commit.
Then you are free to commit while indexing the next hour's worth of
changes, but these commits are not marked as searchable.
But... this is a low level Lucene capability and I don't know of any
plans for Solr to support multiple commit points in the index.
Mike
http://blog.mikemccandless.com
On Tue, May 10, 2011 at 9:22 AM, vrpar...@gmail.com<vrpar...@gmail.com> wrote:
Hello all,
indexing with dataimporthandler runs every hour (new records will be added,
some records will be updated) note :large data
requirement is when indexing is in progress, searching (on already indexed
data) should not affect
so should i use multicore-with merge and swap or delta query or any other
way?
Thanks
--
View this message in context:
http://lucene.472066.n3.nabble.com/how-to-do-offline-adding-updating-index-tp2923035p2923035.html
Sent from the Solr - User mailing list archive at Nabble.com.