Re: High add/delete rate and index fragmentation

Rodrigo De Castro Fri, 04 Dec 2009 09:56:22 -0800

On Thu, Dec 3, 2009 at 3:59 PM, Lance Norskog <goks...@gmail.com> wrote:


> #2: The standard architecture is with a master that only does indexing
> and one or more slaves that only handle queries. The slaves poll the
> master for index updates regularly. Java 1.4 has a built-in system for
> this.
>


How do you achieve durability with the standard architecture? For one of our
use cases (which does not have much churn), we are considering this
architecture, but I don't want an update to be lost if the master goes down
before slaves update. What I was thinking initially is that this could be
achieved having a master per datacenter, which would synchronously update
other masters through a RequestHandler. So I could guarantee this
durability, but of course this architecture would have issues of its own.
like when there is a network partitioning, how you could handle master no
longer being in sync. Is there some work being done to address this use
case?



> An alternate architecture has multiple servers which do both indexing
> and queries in the same index. This provides the shortest "pipeline"
> time from recieving the data to making it available for search.
>


For our use case where there is a high add/delete rate, I was thinking of
using this architecture, as I noticed that records become available right
away. But in this case we have the concern about how well it performs when
adding/deleting. I did an initial test adding many thousands of elements and
did not see any degradation, that's why I asked about its performance when
deleting records (since it only marks for deletion and we have some control
over the automatic segment mergin, I guess this is not much of a problem).

Rodrigo


>
> On Wed, Dec 2, 2009 at 2:43 PM, Jason Rutherglen
> <jason.rutherg...@gmail.com> wrote:
> > Rodrigo,
> >
> > It sounds like you're asking about near realtime search support,
> > I'm not sure.  So here's few ideas.
> >
> > #1 How often do you need to be able to search on the latest
> > updates (as opposed to updates from lets say, 10 minutes ago)?
> >
> > To topic #2, Solr provides master slave replication. The
> > optimize would happen on the master and the new index files
> > replicated to the slave(s).
> >
> > #3 is a mixed bag at this point, and there is no official
> > solution, yet. Shell scripts, and a load balancer could kind of
> > work. Check out SOLR-1277 or SOLR-1395 for progress along these
> > lines.
> >
> > Jason
> > On Wed, Dec 2, 2009 at 11:53 AM, Rodrigo De Castro <rodr...@sacaluta.com>
> wrote:
> >> We are considering Solr to store events which will be added and deleted
> from
> >> the index in a very fast rate. Solr will be used, in this case, to find
> the
> >> right event we need to process (since they may have several attributes
> and
> >> we may search the best match based on the query attributes). Our
> >> understanding is that the common use cases are those wherein the read
> rate
> >> is much higher than writes, and deletes are not as frequent, so we are
> not
> >> sure Solr handles our use case very well or if it is the right fit.
> Given
> >> that, I have a few questions:
> >>
> >> 1 - How does Solr/Lucene degrade with the fragmentation? That would
> probably
> >> determine the rate at which we would need to optimize the index. I
> presume
> >> that it depends on the rate of insertions and deletions, but would you
> have
> >> any benchmark on this degradation? Or, in general, how has been your
> >> experience with this use case?
> >>
> >> 2 - Optimizing seems to be a very expensive process. While optimizing
> the
> >> index, how much does search performance degrade? In this case, having a
> huge
> >> degradation would not allow us to optimize unless we switch to another
> copy
> >> of the index while optimize is running.
> >>
> >> 3 - In terms of high availability, what has been your experience
> detecting
> >> failure of master and having a slave taking over?
> >>
> >> Thanks,
> >> Rodrigo
> >>
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>

Re: High add/delete rate and index fragmentation

Reply via email to