When we are doing a reindex (1x a day), we post around 150-200 documents per second, on average. Our index is not as large though, about 200k docs. During this import, the search service (with faceted page navigation) remains available for front-end searches and performance does not noticeably change. You can see this install running at http://www.6pm.com, where SOLR is in use for every part of the navigation and search.

I believe that a sustained load of 150+ posts per second is very possible. At that load though, it does make sense to consider multiple machines.

+--------------------------------------------------------+
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
+--------------------------------------------------------+


On Oct 9, 2007, at 10:16 AM, Charles Hornberger wrote:

I'm about to do a prototype deployment of Solr for a pretty
high-volume site, and I've been following this thread with some
interest.

One thing I want to confirm: It's really possible for Solr to handle a
constant stream of 10K updates/min (>150 updates/sec) to a
25M-document index? I new Solr and Lucene were good, but that seems
like a pretty tall order. From the responses I'm seeing to David
Whalen's inquiries, it seems like people think that's possible.

Thanks,
Charlie

On 10/9/07, Matthew Runo <[EMAIL PROTECTED]> wrote:
The way I'd do it would be to buy more servers, set up Tomcat on
each, and get SOLR replicating from your current machine to the
others. Then, throw them all behind a load balancer, and there you go.

You could also post your updates to every machine. Then you don't
need to worry about getting replication running.

+--------------------------------------------------------+
  | Matthew Runo
  | Zappos Development
  | [EMAIL PROTECTED]
  | 702-943-7833
+--------------------------------------------------------+


On Oct 9, 2007, at 7:12 AM, David Whalen wrote:

All:

How can I break up my install onto more than one box?  We've
hit a learning curve here and we don't understand how best to
proceed.  Right now we have everything crammed onto one box
because we don't know any better.

So, how would you build it if you could?  Here are the specs:

a) the index needs to hold at least 25 million articles
b) the index is constantly updated at a rate of 10,000 articles
per minute
c) we need to have faceted queries

Again, real-world experience is preferred here over book knowledge.
We've tried to read the docs and it's only made us more confused.

TIA

Dave W


-----Original Message-----
From: Yonik Seeley [mailto:[EMAIL PROTECTED]
Sent: Monday, October 08, 2007 3:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Availability Issues

On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
Do you see any requests that took a really long time to finish?

The requests that take a long time to finish are just
simple queries.
And the same queries run at a later time come back much faster.

Our logs contain 99% inserts and 1% queries.  We are
constantly adding
documents to the index at a rate of 10,000 per minute, so the logs
show mostly that.

Oh, so you are using the same boxes for updating and querying?
When you insert, are you using multiple threads?  If so, how many?

What is the full URL of those slow query requests?
Do the slow requests start after a commit?

Start with the thread dump.
I bet it's multiple queries piling up around some synchronization
points in lucene (sometimes caused by multiple threads generating
the same big filter that isn't yet cached).

What would be my next steps after that?  I'm not sure I'd
understand
enough from the dump to make heads-or-tails of it.  Can I
share that
here?

Yes, post it here.  Most likely a majority of the threads
will be blocked somewhere deep in lucene code, and you will
probably need help from people here to figure it out.

-Yonik







Reply via email to