Thanks Patrick, I'll look and see if I can figure out a clean change for this.
It was the kernel limit for max number of open fds for the process that was 
where the problem shows up (not zk limit). FWIW, we tested with a process fd 
limit of 16K, and ZK performed reasonably well until the fd limit was reached, 
at which point it choked. There was a throughput degradation, but mostly going 
from 0 to 4000 connections. 4000 to 16000 was mostly flat until the sharp drop. 
For our use case it is fine to have a bit of performance loss with huge numbers 
of connections, so long as we can handle the choke, which for initial rollout 
I'm planning on just monitoring for.

C

-----Original Message-----
From: Patrick Hunt [mailto:ph...@apache.org] 
Sent: Wednesday, October 20, 2010 2:06 PM
To: zookeeper-dev@hadoop.apache.org
Subject: Re: implications of netty on client connections

It may just be the case that we haven't tested sufficiently for this case
(running out of fds) and we need to handle this better even in nio. Probably
by cutting off "op_connect" in the selector. We should be able to do similar
in netty.

Btw, on unix one can access the open/max fd count using this:
http://download.oracle.com/javase/6/docs/jre/api/management/extension/com/sun/management/UnixOperatingSystemMXBean.html


Secondly, are you running into a kernel limit or a zk limit? Take a look at
this post describing 1million concurrent connections to a box:
http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3

specifically:
--------------

During various test with lots of connections, I ended up making some
additional changes to my sysctl.conf. This was part trial-and-error, I don't
really know enough about the internals to make especially informed decisions
about which values to change. My policy was to wait for things to break,
check /var/log/kern.log and see what mysterious error was reported, then
increase stuff that sounded sensible after a spot of googling. Here are the
settings in place during the above test:

net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.ipv4.tcp_rmem = 4096 16384 33554432
net.ipv4.tcp_wmem = 4096 16384 33554432
net.ipv4.tcp_mem = 786432 1048576 26777216
net.ipv4.tcp_max_tw_buckets = 360000
net.core.netdev_max_backlog = 2500
vm.min_free_kbytes = 65536
vm.swappiness = 0
net.ipv4.ip_local_port_range = 1024 65535

------------------


I'm guessing that even with this, at some point you'll run into a limit in
our server implementation. In particular I suspect that we may start to
respond more slowly to pings, eventually getting so bad it would time out.
We'd have to debug that and address (optimize).

<http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3>
Patrick

On Tue, Oct 19, 2010 at 7:16 AM, Fournier, Camille F. [Tech] <
camille.fourn...@gs.com> wrote:

> Hi everyone,
>
> I'm curious what the implications of using netty are going to be for the
> case where a server gets close to its max available file descriptors. Right
> now our somewhat limited testing has shown that a ZK server performs fine up
> to the point when it runs out of available fds, at which point performance
> degrades sharply and new connections get into a somewhat bad state. Is netty
> going to enable the server to handle this situation more gracefully (or is
> there a way to do this already that I haven't found)? Limiting connections
> from the same client is not enough since we can potentially have far more
> clients wanting to connect than available fds for certain use cases we might
> consider.
>
> Thanks,
> Camille
>
>

Reply via email to