Hi Gavin,

I see, no registration....As an exercise, increase the tcp_connection_lifetime to 7200 (2 h), just to rule out the possibility of connections timing out.

Are you saying that running a constant load of 50K TCP conns (for long time), does not result in any TCP error ?

Now, regarding the processes, yes, it looks like the TCP main is the one with extra load - this process is responsible for managing the TCP connection - it is not accepting, reading, writing anything, but is detecting events on the TCP sockets and dispatch them to the TCP worker processes.

Do you have a test suite or so to help in generating the traffic corresponding to 50K clients ?

Regards,
Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
http://www.opensips-solutions.com

On 04/30/2013 10:35 PM, Gavin Murphy wrote:
The tcp_persistent_flag isn't set as that appears to be for the registrar module, which we aren't using. We're passing REGISTERs through to our own registrar.

Here is a snapshot of a test currently being run with 50K concurrent TCP "clients" (doesn't show all of the opensips processes). This level of traffic is not generating any TCP-related errors in opensips.

 3411 rcsuser   20   0 6516m 3.1g 3.1g R   54 39.5  73:14.06 opensips
 3376 rcsuser   20   0 6516m 221m 219m S   11  2.8  14:07.50 opensips
 3375 rcsuser   20   0 6516m 221m 219m S   10  2.8  13:57.23 opensips
 3373 rcsuser   20   0 6516m 221m 219m S    9  2.8  14:10.93 opensips
 3374 rcsuser   20   0 6516m 221m 219m S    9  2.8  14:04.26 opensips
 3377 rcsuser   20   0 6516m 1608  200 S    0  0.0   0:01.44 opensips
 3379 rcsuser   20   0 6516m  48m  40m S    0  0.6   0:14.52 opensips
 3380 rcsuser   20   0 6516m  48m  40m S    0  0.6   0:14.65 opensips
 3381 rcsuser   20   0 6516m  48m  40m S    0  0.6   0:14.38 opensips
 3382 rcsuser   20   0 6516m  47m  39m S    0  0.6   0:14.56 opensips
 3385 rcsuser   20   0 6516m  48m  40m S    0  0.6   0:14.52 opensips
 3386 rcsuser   20   0 6516m  49m  41m S    0  0.6   0:14.67 opensips
 3390 rcsuser   20   0 6516m  49m  41m S    0  0.6   0:14.50 opensips
 3394 rcsuser   20   0 6516m  47m  39m S    0  0.6   0:14.42 opensips
 3395 rcsuser   20   0 6516m  47m  39m S    0  0.6   0:14.44 opensips
 3396 rcsuser   20   0 6516m  48m  40m S    0  0.6   0:14.72 opensips
 3401 rcsuser   20   0 6516m  50m  42m S    0  0.6   0:14.72 opensips
 3402 rcsuser   20   0 6516m  50m  42m S    0  0.6   0:14.75 opensips
 3403 rcsuser   20   0 6516m  48m  40m S    0  0.6   0:14.78 opensips
 3404 rcsuser   20   0 6516m  48m  40m S    0  0.6   0:14.60 opensips
 3408 rcsuser   20   0 6516m  50m  42m S    0  0.6   0:14.49 opensips
 3409 rcsuser   20   0 6516m  50m  42m S    0  0.6   0:14.75 opensips
 3410 rcsuser   20   0 6516m  50m  42m S    0  0.6   0:14.61 opensips

And the results from the fifo command:

Process::  ID=0 PID=3367 Type=attendant
Process::  ID=1 PID=3368 Type=MI FIFO
Process::  ID=2 PID=3369 Type=SIP receiver udp:127.0.0.1:9050
Process::  ID=3 PID=3370 Type=SIP receiver udp:127.0.0.1:9050
Process::  ID=4 PID=3371 Type=SIP receiver udp:127.0.0.1:9050
Process::  ID=5 PID=3372 Type=SIP receiver udp:127.0.0.1:9050
Process::  ID=6 PID=3373 Type=SIP receiver udp:192.168.38.175:9050
Process::  ID=7 PID=3374 Type=SIP receiver udp:192.168.38.175:9050
Process::  ID=8 PID=3375 Type=SIP receiver udp:192.168.38.175:9050
Process::  ID=9 PID=3376 Type=SIP receiver udp:192.168.38.175:9050
Process::  ID=10 PID=3377 Type=time_keeper
Process::  ID=11 PID=3378 Type=timer
Process::  ID=12 PID=3379 Type=TCP receiver
Process::  ID=13 PID=3380 Type=TCP receiver
Process::  ID=14 PID=3381 Type=TCP receiver
Process::  ID=15 PID=3382 Type=TCP receiver
Process::  ID=16 PID=3383 Type=TCP receiver
Process::  ID=17 PID=3384 Type=TCP receiver
Process::  ID=18 PID=3385 Type=TCP receiver
Process::  ID=19 PID=3386 Type=TCP receiver
Process::  ID=20 PID=3387 Type=TCP receiver
Process::  ID=21 PID=3388 Type=TCP receiver
Process::  ID=22 PID=3389 Type=TCP receiver
Process::  ID=23 PID=3390 Type=TCP receiver
Process::  ID=24 PID=3391 Type=TCP receiver
Process::  ID=25 PID=3392 Type=TCP receiver
Process::  ID=26 PID=3393 Type=TCP receiver
Process::  ID=27 PID=3394 Type=TCP receiver
Process::  ID=28 PID=3395 Type=TCP receiver
Process::  ID=29 PID=3396 Type=TCP receiver
Process::  ID=30 PID=3397 Type=TCP receiver
Process::  ID=31 PID=3398 Type=TCP receiver
Process::  ID=32 PID=3399 Type=TCP receiver
Process::  ID=33 PID=3400 Type=TCP receiver
Process::  ID=34 PID=3401 Type=TCP receiver
Process::  ID=35 PID=3402 Type=TCP receiver
Process::  ID=36 PID=3403 Type=TCP receiver
Process::  ID=37 PID=3404 Type=TCP receiver
Process::  ID=38 PID=3405 Type=TCP receiver
Process::  ID=39 PID=3406 Type=TCP receiver
Process::  ID=40 PID=3407 Type=TCP receiver
Process::  ID=41 PID=3408 Type=TCP receiver
Process::  ID=42 PID=3409 Type=TCP receiver
Process::  ID=43 PID=3410 Type=TCP receiver
Process::  ID=44 PID=3411 Type=TCP main

So is it a correct assumption that the "TCP main" type is responsible for accepting the initial connection and handing it off to one of the "TCP receiver" types? Is that why it uses the most CPU and memory resources? If so, is it just memory and CPU that are limiting factors in terms of how many connections we can get established concurrently?

Gavin

On 29/04/2013 9:48 AM, Bogdan-Andrei Iancu wrote:
Hello

        Gavin, 

The errors you get indicates that OpenSIPS is trying to open a TCP connection to a destination which does not accept it. Based on your description, I would say there is not need for OpenSIPS to open TCP connections - they will be open by the clients when registering.

Ruling out the scenario of a misrouting , the only explanation will be that the TCP connections expires (timeout without traffic) long before the corresponding registration - so you end up with a registration (in usrloc) which has no TCP conn towards the actual device. Are you using the tcp_persistent_flag ?
             http://www.opensips.org/html/docs/modules/1.9.x/registrar.html#id250105

About the load on the processes, you can do "opensipsctl fifo ps" to get the listing of the processes and their description - you could correlate with the TOP info to see what's the process burning CPU

Regards,

Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
http://www.opensips-solutions.com


On 04/26/2013 05:44 PM, Gavin Murphy wrote:
We're trying to load up opensips with as many TCP connections as we possibly can. So far we've got it to about 82K, but failures start occurring at that point. We have 8GBs of RAM allocated to the server as a whole (is that enough? we don't appear to be exhausting it). We've set the following parameters for OpenSIPS:

tcp_children=32
tcp_max_connections=250000
tcp_connection_lifetime=610
tcp_keepalive=1
tcp_keepcount=3
tcp_keepidle=300
tcp_keepinterval=300

We have also set ulimit -n 1024000 and ulimit -s 768.

The scenario is that our load driver establishes "client" connections to OpenSIPS via TCP, and sends REGISTERs over those connections. While the REGISTERs come in over TCP, they are sent out to our registrar via UDP. Around the point where we get to the 40K connection mark we start seeing the following in the logs:

Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]: ERROR:core:tcp_blocking_connect: poll error: flags 1c
Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]: ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111) Connection refused
Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]: ERROR:core:tcpconn_connect: tcp_blocking_connect failed
Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]: ERROR:core:tcp_send: connect failed
Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]: ERROR:tm:msg_send: tcp_send failed

It almost appears as though opensips is trying to establish a connection somewhere and is being refused. Except that it shouldn't be trying to establish any, unless it's for internal purposes. Unfortunately the logs aren't clear on that point (in terms of what connection is trying to be established).

One other thing that appears puzzling: it seems that one of the opensips processes is bearing most of the brunt. I am assuming that it's the instance that is actually accepting the connections, and that the subsequent (low) amount of traffic is then handed off to the children. But if that's the case, it also means that it's handling a lot of the workload, and I was hoping that it would be more evenly distributed.

Here is a snapshot of the opensips processes in top:

27577 rcsuser   20   0 6516m 2.5g 2.5g R   76 31.9   8:15.26 opensips
27542 rcsuser   20   0 6516m 181m 180m S   16  2.3   0:54.60 opensips
27541 rcsuser   20   0 6516m 182m 180m S   14  2.3   0:54.47 opensips
27539 rcsuser   20   0 6516m 182m 180m S   13  2.3   0:53.75 opensips
27540 rcsuser   20   0 6516m 182m 180m S   11  2.3   0:53.64 opensips
27545 rcsuser   20   0 6516m  37m  29m S    0  0.5   0:01.03 opensips
27551 rcsuser   20   0 6516m  35m  27m S    0  0.4   0:00.94 opensips
27553 rcsuser   20   0 6516m  36m  28m S    0  0.5   0:00.95 opensips
27555 rcsuser   20   0 6516m  37m  29m S    0  0.5   0:00.99 opensips
27557 rcsuser   20   0 6516m  35m  27m S    0  0.4   0:00.92 opensips
27558 rcsuser   20   0 6516m  35m  27m S    0  0.4   0:00.90 opensips
27560 rcsuser   20   0 6516m  36m  28m S    0  0.5   0:00.98 opensips
27563 rcsuser   20   0 6516m  36m  28m S    0  0.5   0:00.94 opensips
27564 rcsuser   20   0 6516m  36m  27m S    0  0.5   0:00.93 opensips
27565 rcsuser   20   0 6516m  36m  28m S    0  0.5   0:00.93 opensips
27567 rcsuser   20   0 6516m  36m  28m S    0  0.5   0:00.95 opensips
27575 rcsuser   20   0 6516m  36m  28m S    0  0.5   0:00.95 opensips
27576 rcsuser   20   0 6516m  36m  28m S    0  0.5   0:00.98 opensips

So basically what I'm looking for is some help on getting the operating system and opensips tuned to the point where we can get substantially more than 80K connections. Or am I asking for too much?

Thanks,

Gavin


_______________________________________________
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users



--
NewPace Logo


Gavin Murphy

Vice President & CTO, NewPace
phone +1 (902) 406–8375  x1002
email gavin.mur...@newpace.com
aim gavin.murphy@newpace.com
_______________________________________________
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

Reply via email to