RE: fast update handlers

Will Johnson Thu, 10 May 2007 06:54:46 -0700

I guess I was more concerned with doing the frequent commits and how
that would affect the caches.  Say I have 2M docs in my main index but I
want to add docs every 2 seconds all while doing queries.  if I do
commits every 2 seconds I basically loose any caching advantage and my
faceting performance goes down the tube.  If however, I were to add
things to a smaller index and then roll it into the larger one every ~30
minutes then I only take the hit on computing the larger filters caches
on that interval.  Further, if my smaller index were based on a
RAMDirectory instead of a FSDirectory I assume computing the filter sets
for the smaller index should be fast enough even every 2 seconds.

- will

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Thursday, May 10, 2007 9:49 AM
To: solr-user@lucene.apache.org
Subject: Re: fast update handlers

On 5/10/07, Will Johnson <[EMAIL PROTECTED]> wrote:
> I'm trying to setup a system to have very low index latency (1-2
> seconds) and one of the javadocs intrigued me:
>
> "DirectUpdateHandler2 implements an UpdateHandler where documents are
> added directly to the main Lucene index as opposed to adding to a
> separate smaller index"
>
>
> The plain DirectUpdateHandler also had the same in its docs.  Does
this
> imply that there use to be another handler that could send docs to a
> small/faster index and then merge them in with a larger one or that
> someone could in the future?

That was the original design, before I thought of the current method
in DUH2. DirectUpdateHandler was just meant to get things working to
establish the external interface (it's only for testing... very slow
at overwriting docs).

Adding documents to a separate index and then merging would have no
real indexing speed advantage (it's essentially what Lucene does
anyway when adding to a large index).  There would be some advantage
for index distribution, but it would complicate things greatly.

High latency is caused by segment merges... this would happen when you
periodically had to merge the smaller index into the larger anyway.
You could do some other tricks for more predictable index times... set
a large mergeFactor and then call optimize after you have added your
batch of documents.

Stay tuned though... there has been some work on a lucene patch to do
merging in a background thread.

-Yonik

RE: fast update handlers

Reply via email to