Are you sure the requests are getting queued because the LB is detecting
that Solr won't handle them?
The reason why I'm asking is I know that ELB doesn't handle bursts well.
The load balancer needs to "warm up," which essentially means it might
be underpowered at the beginning of a burst. It will spool up more
resources if the average load over the last minute is high. But for that
minute it will definitely not be able to handle a burst.
If you're testing infrastructure using a benchmarking tool that doesn't
slowly ramp up traffic, you're definitely encountering this problem.
Michael
Jani, Vrushank <mailto:vrushank.j...@truelocal.com.au>
2015-05-19 at 03:51
Hello,
We have production SOLR deployed on AWS Cloud. We have currently 4
live SOLR servers running on m3xlarge EC2 server instances behind ELB
(Elastic Load Balancer) on AWS cloud. We run Apache SOLR in Tomcat
container which is sitting behind Apache httpd. Apache httpd is using
prefork mpm and the request flows from ELB to Apache Httpd Server to
Tomcat (via AJP).
Last few days, we are seeing increase in the requests around 20000
requests minute hitting the LB. In effect we see ELB Surge Queue
Length continuously being around 100.
Surge Queue Length: represents the total number of request pending
submission to the instances, queued by the load balancer;
This is causing latencies and time outs from Client applications. Our
first reaction was that we don't have enough max connections set
either in HTTPD or Tomcat. What we saw, the servers are very lightly
loaded with very low CPU and memory utilisation. Apache preform
settings are as below on each servers with keep-alive turned off.
<IfModule prefork.c>
StartServers 8
MinSpareServers 5
MaxSpareServers 20
ServerLimit 256
MaxClients 256
MaxRequestsPerChild 4000
</IfModule>
Tomcat server.xml has following settings.
<Connector port="8080" protocol="AJP/1.3" address="127.0.0.1"
maxThreads="500" connectionTimeout="60000"/>
For HTTPD – we see that there are lots of TIME_WAIT connections Apache
port around 7000+ but ESTABLISHED connections are around 20.
For Tomact – we see about 60 ESTABLISHED connections on tomcat AJP port.
So the servers and connections doesn't look like fully utilised to the
capacity. There is no visible stress anywhere. However we still get
requests being queued up on LB because they can not be served from
underlying servers.
Can you please help me resolving this issue? Can you see any apparent
problem here? Am I missing any configuration or settings for SOLR?
Your help will be truly appreciated.
Regards
VJ
Vrushank Jani
[http://media.for.truelocal.com.au/signature/img/divider.png] Senior
Java Developer
T 02 8312
1625[http://media.for.truelocal.com.au/signature/img/divider.png] E
vrushank.j...@truelocal.com.au<mailto:yourem...@truelocal.com.au>
[http://media.for.truelocal.com.au/signature/img/TL_logo.png]<http://www.truelocal.com.au/>
[http://media.for.truelocal.com.au/signature/img/TL_facebook.png]
<https://www.facebook.com/truelocal>
[http://media.for.truelocal.com.au/signature/img/TL_twitter.png]
<https://www.twitter.com/truelocal>
[http://media.for.truelocal.com.au/signature/img/TL_google.png]
<https://plus.google.com/+truelocal/posts>
[http://media.for.truelocal.com.au/signature/img/TL_pintrest.png]
<http://www/pinterest.com/truelocal>