Hi Guy,

On Tue, Jul 27, 2010 at 12:55:54AM -0700, Guy Knights wrote:
> We have a setup currently with haproxy in front of 5 webservers
> running nginx and apache. Nginx set up to proxy dynamic requests to
> PHP and serve out static content itself. It seems to work well, and
> over time we've adjusted the maxconn settings gradually to where it's
> now configured with 25000 maxconn, with each of the 5 web servers set
> to a maxconn of 5000. This is probably overkill as haproxy is usually
> serving no more than 500 current connections in total, but the web
> servers are running on some pretty powerful hardware so my thinking is
> that it doesn't hurt to have the option there in case load increases
> over time (which we forsee being the case).

Yes in fact it can hurt. The main goal of the server maxconn setting
is precisely to prevent the server from crashing or swapping due to
too many concurrent connections. While nginx will have no problem
handling thousands of concurrent connections, it can become a different
problem when those connections are dynamic because the amount of memory
consumed by one connection it a lot higher than what it can be for a
static file. When you see some apache processes running at 20 MB per
process on PHP or Perl, you can easily imagine that even with 12 GB
of RAM, they won't ever support more than 600 concurrent connections.

> The web servers are 16 core, with 12gb ram, while the haproxy server
> is quad core, also with 12gb ram. This being the case, can anyone
> offer any advice as to whether I should tweak the maxconn settings -
> whether to lower or raise it, either globally or at a server-specific
> level?

The correct way to proceed is the following :

  1) determine your servers' maximum concurrent connections in the
     worst case (eg: all on PHP because some requests are slowing
     down). I mean the *real* limit, not an imaginary one. For instance,
     if you have 12 GB of RAM but you have an Apache MaxClients set to
     256, no matter what you do, you'll never get above 256.

  2) assing a slightly lower maxconn value to each server in the
     haproxy configuration. By "slightly lower" I mean that you
     want to save some resources on the server (eg: memory during
     backups or log rotation), and you also want to be able to
     directly connect to the servers from haproxy (health checks)
     or from your own browser to check that everything works.

  3) count the sum of all the servers' maxconn in your backends, and
     you'll have the lower frontend's maxconn limit. Now you want
     haproxy to be able to manage queues and to support load variations,
     so you'll probably double or triple that value (or even more).
     Take into account the amount of available RAM on your haproxy
     machine. Count about 32kB per connection for haproxy plus about
     as much for the system when it's finely tuned. Check that haproxy
     plus the system don't eat more than 2/3 of the RAM. Note that on
     32-bit systems, one haproxy process will never be able to allocate
     more than 2-3 GB depending on how the system was compiled. Then,
     assign the resulting value to the frontend's maxconn setting.

  4) assign a value to the global maxconn setting that must be slightly
     above the frontend's maxconn. This will help with stats socket and
     will also allow you to support multiple concurrent frontends if
     required.

To give you an idea, I once got a report from a site running at 150000
concurrent connections in 4.5 GB of RAM for haproxy. They had done a
very nice tuning, and this is the highest value I have ever observed
in prod.

> Also, if there are any other config options I should consider
> I'd appreciate any advice as this is a new setup for us and we're
> learning as we go.

You can lower the buffer size in order to reduce memory usage. But in
your case that does not seem really needed in my opinion. Other config
tunables greatly depend on the workload. Long-lived connections are not
tuned the same way as short-lived, and you don't tune equally for large
and small objects.

However, what is certain is that if you blindly tune for too large a
lot, you can get trouble before you reach your maximal load, and at
this point you'll have to make tradeoffs between settings that have
always been there without knowing if they're really required, and the
ones you'd like to add to support a load increase. So you should really
tune for something close to your current usage. If you peak at 500
concurrent connections, tune for 2000 and observe.

Regards,
Willy


Reply via email to