Hello, > You are right that we would need near realtime support. The problem is not > so much about new records becoming available, but guaranteeing that deleted > records will not be returned. For this reason, our plan would be to update > and search a master index, provided that: (1) search while updating records > is ok,
It is in general, though I haven't fully tested NRT under high load. > (2) performance is not degraded substantially due to fragmentation, You can control that somewhat via mergeFactor. > (3) optimization does not impact search, It will - disk IO, OS cache, and such will be affected, and that will affect search. > and (4) we ensure durability - if a > node goes down, an update was replicated to another node who can take over. Maybe just index to > 1 masters? For example, another non-search tool I'm using (Voldemort) has the notion of "required writes", which represents how many copies of data should be written at insert/add time. > It seems that 1 and 2 are not so much of a problem, 3 would need to be > tested. I would like know more about how 4 has been addressed, so we don't > lose updates if a master goes down between updates and index replication. Lucene buffers documents while indexing, to avoid constant disk writes. HDD itself does some of that, too. So I think you can always lose some data is whatever is in the buffers doesn't get flushed when somebody trips over the power cord in the data center. Otis > > #3 is a mixed bag at this point, and there is no official > > solution, yet. Shell scripts, and a load balancer could kind of > > work. Check out SOLR-1277 or SOLR-1395 for progress along these > > lines. > > > > Thanks for the links. > > Rodrigo > > > > On Wed, Dec 2, 2009 at 11:53 AM, Rodrigo De Castro > > wrote: > > > We are considering Solr to store events which will be added and deleted > > from > > > the index in a very fast rate. Solr will be used, in this case, to find > > the > > > right event we need to process (since they may have several attributes > > and > > > we may search the best match based on the query attributes). Our > > > understanding is that the common use cases are those wherein the read > > rate > > > is much higher than writes, and deletes are not as frequent, so we are > > not > > > sure Solr handles our use case very well or if it is the right fit. Given > > > that, I have a few questions: > > > > > > 1 - How does Solr/Lucene degrade with the fragmentation? That would > > probably > > > determine the rate at which we would need to optimize the index. I > > presume > > > that it depends on the rate of insertions and deletions, but would you > > have > > > any benchmark on this degradation? Or, in general, how has been your > > > experience with this use case? > > > > > > 2 - Optimizing seems to be a very expensive process. While optimizing the > > > index, how much does search performance degrade? In this case, having a > > huge > > > degradation would not allow us to optimize unless we switch to another > > copy > > > of the index while optimize is running. > > > > > > 3 - In terms of high availability, what has been your experience > > detecting > > > failure of master and having a slave taking over? > > > > > > Thanks, > > > Rodrigo > > > > >