Re: SOLR Timeout

Sean Timm Thu, 10 Jul 2008 12:05:23 -0700

If you have a number of long queries running, your system can become CPUbound resulting in low throughput and high response times. There aremany ways you can construct a query that will cause it to take a longtime to process, but the SOLR-502 patch can only address the ones wherethe work is being done in collect().


Here is a comment on SOLR-502 that hopefully helps answer your questions.

The timeout is to protect the server side. The client side can belargely protected by setting a read timeout, but if the client abortsbefore the server responds, the server is just wasting resourcesprocessing a request that will never be used. The partial results isuseful in a couple of scenarios, probably the most important is alarge distributed complex where you would rather get whatever resultsyou can from a slow shard rather than throw them away.
As a real world example, the query "contact us about our site" on a2.3MM document index (partial Dmoz crawl) takes several seconds tocomplete, while the mean response time is sub 50 ms. We've had caseswhere a bot walks the next page links (including expensive queriessuch as this). Also users are prone to repeatedly click the querybutton if they get impatient on a slow site. Without a server sidetimeout, this is a real issue.
Rate limiting and limiting the number of next pages that can befetched at the front end are also part of the solution to the aboveexample.

-Sean

McBride, John wrote:

Hello All,
Prior to SOLR 1.3 and nutch patch integration - what actually is the effect of SOLR (non)-timeout? Do the threads eventally die? DOes a new request cause a new query thread to open, or is the system locked?What causes a timeout- a complex query?Is SOLR 1.2 open to DoS attacks by submitting complex queries?Thanks,
John

Re: SOLR Timeout

Reply via email to