If you have a number of long queries running, your system can become CPU bound resulting in low throughput and high response times. There are many ways you can construct a query that will cause it to take a long time to process, but the SOLR-502 patch can only address the ones where the work is being done in collect().

Here is a comment on SOLR-502 that hopefully helps answer your questions.
The timeout is to protect the server side. The client side can be largely protected by setting a read timeout, but if the client aborts before the server responds, the server is just wasting resources processing a request that will never be used. The partial results is useful in a couple of scenarios, probably the most important is a large distributed complex where you would rather get whatever results you can from a slow shard rather than throw them away.

As a real world example, the query "contact us about our site" on a 2.3MM document index (partial Dmoz crawl) takes several seconds to complete, while the mean response time is sub 50 ms. We've had cases where a bot walks the next page links (including expensive queries such as this). Also users are prone to repeatedly click the query button if they get impatient on a slow site. Without a server side timeout, this is a real issue.

Rate limiting and limiting the number of next pages that can be fetched at the front end are also part of the solution to the above example.


McBride, John wrote:
Hello All,
Prior to SOLR 1.3 and nutch patch integration - what actually is the effect of SOLR (non)-timeout? Do the threads eventally die? DOes a new request cause a new query thread to open, or is the system locked? What causes a timeout- a complex query? Is SOLR 1.2 open to DoS attacks by submitting complex queries? Thanks,

Reply via email to