Are you sure the requests are getting queued because the LB is detecting that Solr won't handle them?

The reason why I'm asking is I know that ELB doesn't handle bursts well. The load balancer needs to "warm up," which essentially means it might be underpowered at the beginning of a burst. It will spool up more resources if the average load over the last minute is high. But for that minute it will definitely not be able to handle a burst.

If you're testing infrastructure using a benchmarking tool that doesn't slowly ramp up traffic, you're definitely encountering this problem.

Michael

Jani, Vrushank <mailto:vrushank.j...@truelocal.com.au>
2015-05-19 at 03:51

Hello,

We have production SOLR deployed on AWS Cloud. We have currently 4 live SOLR servers running on m3xlarge EC2 server instances behind ELB (Elastic Load Balancer) on AWS cloud. We run Apache SOLR in Tomcat container which is sitting behind Apache httpd. Apache httpd is using prefork mpm and the request flows from ELB to Apache Httpd Server to Tomcat (via AJP).

Last few days, we are seeing increase in the requests around 20000 requests minute hitting the LB. In effect we see ELB Surge Queue Length continuously being around 100. Surge Queue Length: represents the total number of request pending submission to the instances, queued by the load balancer;

This is causing latencies and time outs from Client applications. Our first reaction was that we don't have enough max connections set either in HTTPD or Tomcat. What we saw, the servers are very lightly loaded with very low CPU and memory utilisation. Apache preform settings are as below on each servers with keep-alive turned off.

<IfModule prefork.c>
StartServers 8
MinSpareServers 5
MaxSpareServers 20
ServerLimit 256
MaxClients 256
MaxRequestsPerChild 4000
</IfModule>


Tomcat server.xml has following settings.

<Connector port="8080" protocol="AJP/1.3" address="127.0.0.1" maxThreads="500" connectionTimeout="60000"/> For HTTPD – we see that there are lots of TIME_WAIT connections Apache port around 7000+ but ESTABLISHED connections are around 20.
For Tomact – we see about 60 ESTABLISHED connections on tomcat AJP port.

So the servers and connections doesn't look like fully utilised to the capacity. There is no visible stress anywhere. However we still get requests being queued up on LB because they can not be served from underlying servers.

Can you please help me resolving this issue? Can you see any apparent problem here? Am I missing any configuration or settings for SOLR?

Your help will be truly appreciated.

Regards
VJ






Vrushank Jani [http://media.for.truelocal.com.au/signature/img/divider.png] Senior Java Developer T 02 8312 1625[http://media.for.truelocal.com.au/signature/img/divider.png] E vrushank.j...@truelocal.com.au<mailto:yourem...@truelocal.com.au>

[http://media.for.truelocal.com.au/signature/img/TL_logo.png]<http://www.truelocal.com.au/> [http://media.for.truelocal.com.au/signature/img/TL_facebook.png] <https://www.facebook.com/truelocal> [http://media.for.truelocal.com.au/signature/img/TL_twitter.png] <https://www.twitter.com/truelocal> [http://media.for.truelocal.com.au/signature/img/TL_google.png] <https://plus.google.com/+truelocal/posts> [http://media.for.truelocal.com.au/signature/img/TL_pintrest.png] <http://www/pinterest.com/truelocal>


Reply via email to