Re: [OpenSIPS-Users] Autoscaler in 3.2.x

Bogdan-Andrei Iancu Mon, 12 Sep 2022 02:56:35 -0700

Hi Yury,

Maybe you can get a trap output while the procs are in 100% and beforeeverything dies ?


Best regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  https://www.opensips-solutions.com
OpenSIPS Summit 27-30 Sept 2022, Athens
  https://www.opensips.org/events/Summit-2022Athens/

On 9/12/22 11:12 AM, Yury Kirsanov wrote:

Hi Bogdan,
We've run into another issue, this time I was just restarting OpenSIPSserver during busy hours when about ~2500 SIP devices were registeringand making calls (even though dialog number was only around 100-200but there were a lot of packets) and I was unable to successfullyrestart OpenSIPS, it was getting some processes stuck almostimmediately at 100% load and then they were starting to consume moreand more memory and after eating up all the memory they were dying andOpenSIPS stopped processing SIP packets.
I believe it's similar to autoscaler issue because in this case I onlyhad 16 UDP workers and 16 TCP workers and it was taking more time forOpenSIPS to run into the issue, while when I had autoscaler on itwasn't able to open that many processes at once so currently activeones were getting stuck very fast and crash was happening almostimmediately.
I'm running a localhost REDIS cache to store where to proxy each SIPpacket to and if there's no record for this SIP device then I'mquerying REST server and cache its response. REST server load was nomore than 25% during restart when all SIP devices were urgently tryingto re-connect to OpenSIPS so I don't think they're of any issue.
I'm using async REST calls and believe there should be no issues withmy configuration script even though it runs a lot of nested routes dueto async REST requests. Hopefully I didn't forget some 'exit'statements anywhere but if it was the case - OpenSIPS service would belocking up at any time.
OpenSIPS itself is running on a VMWare host as a virtual machine and Icould see it was consuming up to 100% CPU of a 40-core host when itwas locking up. Also VMWare readyness for VM was spiking to 1500msduring these lock-ups meaning that VM was waiting for some cores toactually free up to get some CPU time.
The only way out of this situation for me was to run multiple OpenSIPSVMs and spread the load between them, no matter what I tried to doI wasn't able to make OpenSIPS running fine again even though it wasworking perfectly fine for more than a week in this configuration andunder same load, but I was starting/restarting it only during nighthours when there were no calls active.
I'm happy to share my configuration file with you privately if requred.

Hope this helps!

Thanks and best regards,
Yury.
On Wed, Sep 7, 2022 at 9:54 PM Bogdan-Andrei Iancu<bog...@opensips.org <mailto:bog...@opensips.org>> wrote:
    Hi Yury,

    Thanks for the details info here - let me do a review of some code
    and run some tests, as at this point I have a good idea on the
    direction to dig into.

    I will update here.

    Best regards,

    Bogdan-Andrei Iancu

    OpenSIPS Founder and Developer
       https://www.opensips-solutions.com  <https://www.opensips-solutions.com>
    OpenSIPS Summit 27-30 Sept 2022, Athens
       https://www.opensips.org/events/Summit-2022Athens/  
<https://www.opensips.org/events/Summit-2022Athens/>

    On 9/6/22 11:24 AM, Yury Kirsanov wrote:
    Hi Bogdan,
    Yes, I'm listening on all types of sockets including UDP, TCP and
    TLS on the outside public interface and then forward traffic into
    internal LAN via UDP only.

    Previously it was getting stuck quite easily, now I had to wait
    for a while before this actually happened. I've routed part of my
    customers to this server to obtain this result so I will have to
    do that again.

    As soon as I see one of the processes stuck I'll dot the trap
    command and send you all the details including processes load, ps
    output and so on.

    For now I had to switch autoscaling off and just create many
    listeners. Do I understand correctly that I need to restart
    OpenSIPS in order to apply autoscaling profiles and reload-routes
    is not sufficient?

    Also, do I need separate UDP profiles for public and private
    interfaces? And do I need to apply autoscaling profile just to a
    socket or I need to specify udp or tcp_workers with autoscaler too?

    Thanks and best regards,
    Yury.

    On Tue, 6 Sept 2022, 18:18 Bogdan-Andrei Iancu,
    <bog...@opensips.org <mailto:bog...@opensips.org>> wrote:

        Hi Yury,

        Thanks for the info. I see that the stuck process (24) is an
        auto-scalled one (based on its id). Do you have SIP traffic
        from UDP to TCP or doing some HEP capturing for SIP ? I saw a
        recent similar report where a UDP auto-scalled worked got
        stuck when trying to do some communication with the TCP
        main/manager process (in order to handle a TCP operation).

        BTW, any chance to do a "opensips-cli -x trap" when you have
        that stuck process, just to see where is it stuck? and is it
        hard to reproduce? as I may ask you to extract some
        information from the running process....

        Regards,

        Bogdan-Andrei Iancu

        OpenSIPS Founder and Developer
           https://www.opensips-solutions.com  
<https://www.opensips-solutions.com>
        OpenSIPS Summit 27-30 Sept 2022, Athens
           https://www.opensips.org/events/Summit-2022Athens/  
<https://www.opensips.org/events/Summit-2022Athens/>

        On 9/3/22 6:54 PM, Yury Kirsanov wrote:

_______________________________________________
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

Reply via email to