> Oh, so you are using the same boxes for updating and querying?

Yep.  We have a MySQL database on the box and we query it and
POST directly into SOLR via wget in PERL.  We then also hit the
box for queries.

[We'd be very interested in hearing about best practices on
how to seperate-out the data from the index and how to balance
them when the inserts outweigh the selects by factors of 50,000:1]

> When you insert, are you using multiple threads?  If so, how many?

We're not threading at all.  We have a PERL script that does a
select statement out of a MySQL database and runs POSTs sequentially
into SOLR, one per document.  After a batch of 10,000 POSTs, we run a
background commit (using waitFlush and waitSearcher)

Again, I'd be very grateful for success stories from people in terms
of good server architecture.  We are ready and willing to change versions
of linux, of the Java container, etc.  And we're ready to add more
boxes if that'll help.  We just need some guidance.

> What is the full URL of those slow query requests?

They can be anything.  For example:

[08/10/2007:18:51:55 +0000] "GET 
/solr/select/?q=solr&version=2.2&start=0&rows=10&indent=on HTTP/1.1" 200 45799

> Do the slow requests start after a commit?

Based on the way the logs read, you could argue that point.
The stream of POSTs end in the logs and then subsequent queries
take longer to run, but it's hard to be sure there's a direct
correlation.

> Yes, post it here.  Most likely a majority of the threads 
> will be blocked somewhere deep in lucene code, and you will 
> probably need help from people here to figure it out.

Next time it happens I'll shoot it over.
  
--Dave


> -----Original Message-----
> From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 08, 2007 3:42 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Availability Issues
> 
> On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
> > > Do you see any requests that took a really long time to finish?
> >
> > The requests that take a long time to finish are just 
> simple queries.  
> > And the same queries run at a later time come back much faster.
> >
> > Our logs contain 99% inserts and 1% queries.  We are 
> constantly adding 
> > documents to the index at a rate of 10,000 per minute, so the logs 
> > show mostly that.
> 
> Oh, so you are using the same boxes for updating and querying?
> When you insert, are you using multiple threads?  If so, how many?
> 
> What is the full URL of those slow query requests?
> Do the slow requests start after a commit?
> 
> > > Start with the thread dump.
> > > I bet it's multiple queries piling up around some synchronization 
> > > points in lucene (sometimes caused by multiple threads generating 
> > > the same big filter that isn't yet cached).
> >
> > What would be my next steps after that?  I'm not sure I'd 
> understand 
> > enough from the dump to make heads-or-tails of it.  Can I 
> share that 
> > here?
> 
> Yes, post it here.  Most likely a majority of the threads 
> will be blocked somewhere deep in lucene code, and you will 
> probably need help from people here to figure it out.
> 
> -Yonik
> 
> 

Reply via email to