At one point I was working on SOLR-7344 <https://issues.apache.org/jira/browse/SOLR-7344> (but it fell off the radar due to various reasons). Specifically I built a servlet request filter which implements a customizable queuing mechanism using asynchronous servlet API (Servlet 3 spec). This way you can define how many concurrent requests of a specific type (e.g. query, indexing etc.) you want to process. This can also be extended at a core (or collection) level.
https://github.com/hgadre/servletrequest-scheduler <https://github.com/hgadre/servletrequest-scheduler> If this is something interesting and useful for the community, I would be more than happy to help moving this forward. Otherwise I would like to get any feedback for possible improvements (or drawbacks) etc. Thanks Hrishikesh On Wed, Aug 2, 2017 at 9:45 PM, Walter Underwood <wun...@wunderwood.org> wrote: > > > On Aug 2, 2017, at 8:33 PM, Shawn Heisey <apa...@elyograg.org> wrote: > > > > IMHO, intentionally causing connections to fail when a limit is exceeded > > would not be a very good idea. When the rate gets too high, the first > > thing that happens is all the requests slow down. The slowdown could be > > dramatic. As the rate continues to increase, some of the requests > > probably would begin to fail. > > No, this is a very good idea. It is called “load shedding” or “fail fast”. > Gracefully dealing with overload is an essential part of system design. > > At Netflix, with a pre-Jetty Solr (war file running under Tomcat), we took > down 40 front end servers with slow response times from the Solr server > farm. We tied up all the front end threads waiting on responses from the > Solr servers. That left no front end threads available to respond to > incoming HTTP requests. It was not a fun evening. > > To fix this, we configured the Citrix load balancer to overflow to a > different server when the outstanding back-end requests hit a limit. The > overflow server was a virtual server that immediately returned a 503. That > would free up front end connections and threads in an overload condition. > The users would get a “search unavailable” page, but the rest of the site > would continue to work. > > Unfortunately, the AWS load balancers don’t offer anything like this, ten > years later. > > The worst case version of this is a stable congested state. It is pretty > easy to put requests into a queue (connection/server) that are guaranteed > to time out before they are serviced. If you have 35 requests in the queue, > a 1 second service time, and a 30 second timeout, those requests are > already dead when you put them on the queue. > > I learned about this when I worked with John Nagle at Ford Aerospace. I > recommend his note “On Packet Switches with Infinite Storage” (1985) for > the full story. It is only eight pages long, but packed with goodness. > > https://tools.ietf.org/html/rfc970 > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > >