If you have a number of long queries running, your system can become CPU bound resulting in low throughput and high response times. There are many ways you can construct a query that will cause it to take a long time to process, but the SOLR-502 patch can only address the ones where the work is being done in collect().

Here is a comment on SOLR-502 that hopefully helps answer your questions.
The timeout is to protect the server side. The client side can be largely protected by setting a read timeout, but if the client aborts before the server responds, the server is just wasting resources processing a request that will never be used. The partial results is useful in a couple of scenarios, probably the most important is a large distributed complex where you would rather get whatever results you can from a slow shard rather than throw them away.

As a real world example, the query "contact us about our site" on a 2.3MM document index (partial Dmoz crawl) takes several seconds to complete, while the mean response time is sub 50 ms. We've had cases where a bot walks the next page links (including expensive queries such as this). Also users are prone to repeatedly click the query button if they get impatient on a slow site. Without a server side timeout, this is a real issue.

Rate limiting and limiting the number of next pages that can be fetched at the front end are also part of the solution to the above example.

-Sean

McBride, John wrote:
Hello All,
Prior to SOLR 1.3 and nutch patch integration - what actually is the effect of SOLR (non)-timeout? Do the threads eventally die? DOes a new request cause a new query thread to open, or is the system locked? What causes a timeout- a complex query? Is SOLR 1.2 open to DoS attacks by submitting complex queries? Thanks,
John

Reply via email to