On Thu, Dec 3, 2009 at 3:59 PM, Lance Norskog <goks...@gmail.com> wrote:
> #2: The standard architecture is with a master that only does indexing > and one or more slaves that only handle queries. The slaves poll the > master for index updates regularly. Java 1.4 has a built-in system for > this. > How do you achieve durability with the standard architecture? For one of our use cases (which does not have much churn), we are considering this architecture, but I don't want an update to be lost if the master goes down before slaves update. What I was thinking initially is that this could be achieved having a master per datacenter, which would synchronously update other masters through a RequestHandler. So I could guarantee this durability, but of course this architecture would have issues of its own. like when there is a network partitioning, how you could handle master no longer being in sync. Is there some work being done to address this use case? > An alternate architecture has multiple servers which do both indexing > and queries in the same index. This provides the shortest "pipeline" > time from recieving the data to making it available for search. > For our use case where there is a high add/delete rate, I was thinking of using this architecture, as I noticed that records become available right away. But in this case we have the concern about how well it performs when adding/deleting. I did an initial test adding many thousands of elements and did not see any degradation, that's why I asked about its performance when deleting records (since it only marks for deletion and we have some control over the automatic segment mergin, I guess this is not much of a problem). Rodrigo > > On Wed, Dec 2, 2009 at 2:43 PM, Jason Rutherglen > <jason.rutherg...@gmail.com> wrote: > > Rodrigo, > > > > It sounds like you're asking about near realtime search support, > > I'm not sure. So here's few ideas. > > > > #1 How often do you need to be able to search on the latest > > updates (as opposed to updates from lets say, 10 minutes ago)? > > > > To topic #2, Solr provides master slave replication. The > > optimize would happen on the master and the new index files > > replicated to the slave(s). > > > > #3 is a mixed bag at this point, and there is no official > > solution, yet. Shell scripts, and a load balancer could kind of > > work. Check out SOLR-1277 or SOLR-1395 for progress along these > > lines. > > > > Jason > > On Wed, Dec 2, 2009 at 11:53 AM, Rodrigo De Castro <rodr...@sacaluta.com> > wrote: > >> We are considering Solr to store events which will be added and deleted > from > >> the index in a very fast rate. Solr will be used, in this case, to find > the > >> right event we need to process (since they may have several attributes > and > >> we may search the best match based on the query attributes). Our > >> understanding is that the common use cases are those wherein the read > rate > >> is much higher than writes, and deletes are not as frequent, so we are > not > >> sure Solr handles our use case very well or if it is the right fit. > Given > >> that, I have a few questions: > >> > >> 1 - How does Solr/Lucene degrade with the fragmentation? That would > probably > >> determine the rate at which we would need to optimize the index. I > presume > >> that it depends on the rate of insertions and deletions, but would you > have > >> any benchmark on this degradation? Or, in general, how has been your > >> experience with this use case? > >> > >> 2 - Optimizing seems to be a very expensive process. While optimizing > the > >> index, how much does search performance degrade? In this case, having a > huge > >> degradation would not allow us to optimize unless we switch to another > copy > >> of the index while optimize is running. > >> > >> 3 - In terms of high availability, what has been your experience > detecting > >> failure of master and having a slave taking over? > >> > >> Thanks, > >> Rodrigo > >> > > > > > > -- > Lance Norskog > goks...@gmail.com >