A backtrace should provide enough information to know where to look
for issues and that should not take a long time.
Maybe you can use monit to monitor the cpu and on failure run 'kamctl
trap' to get the backtrace.
 if cpu is greater than 50% for 5 cycles then exec "/usr/sbin/kamctl trap"
Make sure that you have the debug rpm installed.

-ovidiu

On Tue, Sep 29, 2015 at 1:40 PM, Alex Balashov
<abalas...@evaristesys.com> wrote:
> Hi,
>
> Thanks very much to you and Ovidiu for the responses. I didn't mean to leave
> this thread hanging. See inline:
>
> On 09/28/2015 05:51 PM, Daniel-Constantin Mierla wrote:
>
>> Were you pulling the backtraces based on the script you pasted in your
>> previous email? That should be good source of information to analyze if
>> what kamailio was doing.
>
>
> Yes, although as yet I have not been able to actually get the operator to
> run a backtrace at the time of the deadlock. It's a psychological and
> political problem: they are so eager to restore service that they do not
> have the discipline to run my debug script, and jump straight to restarting
> Kamailio.
>
> However, the biggest problem that I see is that if the backtraces reveal
> something interesting, it may invite follow-up, e.g. examination of other
> frames and values. That would require a core dump. Dumping core for all 8-12
> child processes would take several minutes, as the shm pool is quite large
> (4 GB). This is a very high-volume installation. The operator would never go
> for that.
>
> So, if I do get an intriguing backtrace, I don't really know what else to do
> to elaborate.
>
>> I already said, if the is a mutex deadlock, it will be also noticed by
>> high cpu usage. Was it the case, or you don't have any access to cpu
>> usage history?
>
>
> I don't have CPU usage history, but I will try to get one next time this
> happens.
>
>> If it is just no more sip message routing, but no high cpu usage, then:
>>
>> - maybe processed were blocked in a lengthily I/O operation (e.g., query
>> to database)
>
>
> That's certainly possible. The backtrace will surely reveal that.
>
>> - maybe someone/something was resetting the network interface (the
>> sockets were bound to previous address) -- e.g., it can be done by some
>> upgrades of OS or dhcp
>
>
> No, that definitely is not the case.
>
>> - maybe some limits of OS were reached, the packets were filtered by
>> kernel (if you have centos with selinux, be sure it is properly
>> configured)
>
>
> I am aware of CentOS's ridiculous default ulimits in CentOS 6.6, and all of
> these have been appropriately set to infinity. SELinux is disabled.
>
> I'll let you know what I find. Thanks for the input!
>
> -- Alex
>
> --
> Alex Balashov | Principal | Evariste Systems LLC
> 303 Perimeter Center North, Suite 300
> Atlanta, GA 30346
> United States
>
> Tel: +1-800-250-5920 (toll-free) / +1-678-954-0671 (direct)
> Web: http://www.evaristesys.com/, http://www.csrpswitch.com/
>
> _______________________________________________
> SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
> sr-users@lists.sip-router.org
> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users



-- 
VoIP Embedded, Inc.
http://www.voipembedded.com

_______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

Reply via email to