Re: High add/delete rate and index fragmentation

Otis Gospodnetic Fri, 04 Dec 2009 10:20:32 -0800

Hello,

> You are right that we would need near realtime support. The problem is not
> so much about new records becoming available, but guaranteeing that deleted
> records will not be returned. For this reason, our plan would be to update
> and search a master index, provided that: (1) search while updating records
> is ok,


It is in general, though I haven't fully tested NRT under high load.

> (2) performance is not degraded substantially due to fragmentation,

You can control that somewhat via mergeFactor.

> (3) optimization does not impact search, 

It will - disk IO, OS cache, and such will be affected, and that will affect 
search.

> and (4) we ensure durability - if a
> node goes down, an update was replicated to another node who can take over.

Maybe just index to > 1 masters?  For example, another non-search tool I'm 
using (Voldemort) has the notion of "required writes", which represents how 
many copies of data should be written at insert/add time.

> It seems that 1 and 2 are not so much of a problem, 3 would need to be
> tested. I would like know more about how 4 has been addressed, so we don't
> lose updates if a master goes down between updates and index replication.

Lucene buffers documents while indexing, to avoid constant disk writes.  HDD 
itself does some of that, too.  So I think you can always lose some data is 
whatever is in the buffers doesn't get flushed when somebody trips over the 
power cord in the data center.

Otis

> > #3 is a mixed bag at this point, and there is no official
> > solution, yet. Shell scripts, and a load balancer could kind of
> > work. Check out SOLR-1277 or SOLR-1395 for progress along these
> > lines.
> >
> 
> Thanks for the links.
> 
> Rodrigo
> 
> 
> > On Wed, Dec 2, 2009 at 11:53 AM, Rodrigo De Castro 
> > wrote:
> > > We are considering Solr to store events which will be added and deleted
> > from
> > > the index in a very fast rate. Solr will be used, in this case, to find
> > the
> > > right event we need to process (since they may have several attributes
> > and
> > > we may search the best match based on the query attributes). Our
> > > understanding is that the common use cases are those wherein the read
> > rate
> > > is much higher than writes, and deletes are not as frequent, so we are
> > not
> > > sure Solr handles our use case very well or if it is the right fit. Given
> > > that, I have a few questions:
> > >
> > > 1 - How does Solr/Lucene degrade with the fragmentation? That would
> > probably
> > > determine the rate at which we would need to optimize the index. I
> > presume
> > > that it depends on the rate of insertions and deletions, but would you
> > have
> > > any benchmark on this degradation? Or, in general, how has been your
> > > experience with this use case?
> > >
> > > 2 - Optimizing seems to be a very expensive process. While optimizing the
> > > index, how much does search performance degrade? In this case, having a
> > huge
> > > degradation would not allow us to optimize unless we switch to another
> > copy
> > > of the index while optimize is running.
> > >
> > > 3 - In terms of high availability, what has been your experience
> > detecting
> > > failure of master and having a slave taking over?
> > >
> > > Thanks,
> > > Rodrigo
> > >
> >

Re: High add/delete rate and index fragmentation

Reply via email to