My guess is that there is not enough shared memory. When an allocation failes OpenSIPS tries to defragment memory to make room which takes a lot of time and must be done under lock.
Please try to increase the shared memory size and tell me if it persists. Alex Massover wrote: > Hi! > > Yes, with -DF_MALLOC. > > 1.6.1 from sources, I build deb package. > I use 128M of shared and 10*1024*1024 private memory (can increase - no > problem). > > Hmmmm, "opensipsctl fifo get_statistics all" crashes/stops the opensips. > > 'fifo uptime' or 'fifo debug' are OK. > > strace while 'fifo get_statistics all': > Process 9509 attached - interrupt to quit > pause() = ? ERESTARTNOHAND (To be restarted) > --- SIGUSR2 (User defined signal 2) @ 0 (0) --- > sigreturn() = ? (mask now []) > pause() = ? ERESTARTNOHAND (To be restarted) > --- SIGCHLD (Child exited) @ 0 (0) --- > sigreturn() = ? (mask now []) > waitpid(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGUSR2}], WNOHANG) = 9520 > waitpid(-1, 0xbf84b4c8, WNOHANG) = 0 > kill(0, SIGTERM) = 0 > --- SIGTERM (Terminated) @ 0 (0) --- > --- SIGCHLD (Child exited) @ 0 (0) --- > sigreturn() = ? (mask now [TERM]) > sigreturn() = ? (mask now []) > rt_sigaction(SIGALRM, {0x8065920, [ALRM], SA_RESTART}, {SIG_DFL}, 8) = 0 > alarm(60) = 0 > wait4(-1, NULL, 0, NULL) = 9514 > wait4(-1, NULL, 0, NULL) = 9519 > wait4(-1, NULL, 0, NULL) = 9521 > wait4(-1, NULL, 0, NULL) = 9522 > wait4(-1, NULL, 0, NULL) = 9512 > --- SIGCHLD (Child exited) @ 0 (0) --- > sigreturn() = ? (mask now []) > --- SIGCHLD (Child exited) @ 0 (0) --- > sigreturn() = ? (mask now []) > wait4(-1, NULL, 0, NULL) = 9510 > wait4(-1, NULL, 0, NULL) = 9516 > --- SIGCHLD (Child exited) @ 0 (0) --- > sigreturn() = ? (mask now []) > --- SIGCHLD (Child exited) @ 0 (0) --- > sigreturn() = ? (mask now []) > wait4(-1, NULL, 0, NULL) = 9515 > wait4(-1, NULL, 0, NULL) = 9517 > wait4(-1, NULL, 0, NULL) = 9524 > wait4(-1, NULL, 0, NULL) = 9525 > --- SIGCHLD (Child exited) @ 0 (0) --- > sigreturn() = ? (mask now []) > --- SIGCHLD (Child exited) @ 0 (0) --- > sigreturn() = ? (mask now []) > --- SIGCHLD (Child exited) @ 0 (0) --- > sigreturn() = ? (mask now []) > --- SIGCHLD (Child exited) @ 0 (0) --- > sigreturn() = ? (mask now []) > wait4(-1, NULL, 0, NULL) = 9511 > wait4(-1, NULL, 0, NULL) = 9513 > wait4(-1, NULL, 0, NULL) = 9518 > wait4(-1, NULL, 0, NULL) = 9523 > wait4(-1, NULL, 0, NULL) = -1 ECHILD (No child processes) > rt_sigaction(SIGALRM, {0x8066080, [ALRM], SA_RESTART}, {0x8065920, [ALRM], > SA_RESTART}, 8) = 0 > stat64("/tmp/opensips_fifo", {st_mode=S_IFIFO|0660, st_size=0, ...}) = 0 > unlink("/tmp/opensips_fifo") = 0 > munmap(0xaed25000, 134217728) = 0 > unlink("/var/run/opensips/opensips.pid") = 0 > alarm(0) = 60 > rt_sigaction(SIGALRM, {SIG_IGN}, {0x8066080, [ALRM], SA_RESTART}, 8) = 0 > exit_group(0) = ? > Process 9509 detached > > -- > Best Regards, > Alex Massover > VoIP R&D TL > Jajah Inc. > > >> -----Original Message----- >> From: users-boun...@lists.opensips.org [mailto:users- >> boun...@lists.opensips.org] On Behalf Of Andrei Dragus >> Sent: Thursday, January 21, 2010 3:09 PM >> To: OpenSIPS users mailling list >> Subject: Re: [OpenSIPS-Users] sched_yield() >> >> >> Hi, >> >> Since all the backtraces are in allocation routines my guess is that >> the >> shared memory lock might be causing a problem. >> >> Are you compiling with -DF_MALLOC? >> What version of OpenSIPS are you using? >> What is the total shared memory pool you are allocating? >> What amount of memory are you using? ( Use : opensipsctl fifo >> get_statistics all ) >> >> Alex Massover wrote: >> >>> Some more, >>> >>> (gdb) bt >>> #0 0xb78dc424 in __kernel_vsyscall () >>> #1 0xb781741c in sched_yield () from /lib/i686/cmov/libc.so.6 >>> #2 0xb73d77fd in build_new_dlg () from >>> >> /usr/lib/opensips/modules/dialog.so >> >>> #3 0xb73d4b81 in dlg_create_dialog () from >>> >> /usr/lib/opensips/modules/dialog.so >> >>> #4 0xb73c9c9e in ?? () from /usr/lib/opensips/modules/dialog.so >>> #5 0x08055030 in do_action () >>> #6 0x08053ebf in run_action_list () >>> #7 0x08056e7a in do_action () >>> #8 0x08053ebf in run_action_list () >>> #9 0x08057d99 in run_top_route () >>> #10 0x0808ad6c in receive_msg () >>> #11 0x080bd2f2 in udp_rcv_loop () >>> #12 0x08069339 in main () >>> >>> >>> (gdb) bt >>> #0 0xb78dc424 in __kernel_vsyscall () >>> #1 0xb781741c in sched_yield () from /lib/i686/cmov/libc.so.6 >>> #2 0xb77242cd in build_cell () from /usr/lib/opensips/modules/tm.so >>> #3 0xb7739c4a in t_newtran () from /usr/lib/opensips/modules/tm.so >>> #4 0xb772e7b8 in t_relay_to () from /usr/lib/opensips/modules/tm.so >>> #5 0xb773b501 in ?? () from /usr/lib/opensips/modules/tm.so >>> #6 0x08055030 in do_action () >>> #7 0x08053ebf in run_action_list () >>> #8 0x08095cf2 in eval_expr () >>> #9 0x080958d9 in eval_expr () >>> #10 0x08095919 in eval_expr () >>> #11 0x080554e2 in do_action () >>> #12 0x08053ebf in run_action_list () >>> #13 0x080569d8 in do_action () >>> #14 0x08053ebf in run_action_list () >>> #15 0x08056e7a in do_action () >>> #16 0x08053ebf in run_action_list () >>> #17 0x08057d99 in run_top_route () >>> #18 0x0808ad6c in receive_msg () >>> #19 0x080bd2f2 in udp_rcv_loop () >>> #20 0x08069339 in main () >>> >>> -- >>> Best Regards, >>> Alex Massover >>> VoIP R&D TL >>> Jajah Inc. >>> >>> >>> >>>> -----Original Message----- >>>> From: users-boun...@lists.opensips.org [mailto:users- >>>> boun...@lists.opensips.org] On Behalf Of Alex Massover >>>> Sent: Thursday, January 21, 2010 2:24 PM >>>> To: OpenSIPS users mailling list >>>> Subject: Re: [OpenSIPS-Users] sched_yield() >>>> >>>> Hi, >>>> >>>> Another one.. It hangs for a number of seconds (but it's enough to >>>> cause to SIP timeouts - MSG queue jumps to 260K), it's hard to make >>>> >> a >> >>>> bt at the right moment. >>>> This one looks better because there's sched_yield() there :) >>>> >>>> (gdb) bt >>>> #0 0xb77d5424 in __kernel_vsyscall () >>>> #1 0xb771041c in sched_yield () from /lib/i686/cmov/libc.so.6 >>>> #2 0x080bf23d in new_avp () >>>> #3 0x080bf53f in add_avp () >>>> #4 0xb72c1c9c in ?? () from /usr/lib/opensips/modules/dialog.so >>>> #5 0x08055030 in do_action () >>>> #6 0x08053ebf in run_action_list () >>>> #7 0x08056e7a in do_action () >>>> #8 0x08053ebf in run_action_list () >>>> #9 0x08056e7a in do_action () >>>> #10 0x08053ebf in run_action_list () >>>> #11 0x08056e7a in do_action () >>>> #12 0x08053ebf in run_action_list () >>>> #13 0x08057d99 in run_top_route () >>>> #14 0x0808ad6c in receive_msg () >>>> #15 0x080bd2f2 in udp_rcv_loop () >>>> #16 0x08069339 in main () >>>> >>>> -- >>>> Best Regards, >>>> Alex Massover >>>> VoIP R&D TL >>>> Jajah Inc. >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: users-boun...@lists.opensips.org [mailto:users- >>>>> boun...@lists.opensips.org] On Behalf Of Alex Massover >>>>> Sent: Thursday, January 21, 2010 2:05 PM >>>>> To: OpenSIPS users mailling list >>>>> Subject: Re: [OpenSIPS-Users] sched_yield() >>>>> >>>>> Hi Andrei, >>>>> Hopefully this is it (with FASTLOCK) >>>>> >>>>> #0 0xb77d5424 in __kernel_vsyscall () >>>>> #1 0xb772babb in poll () from /lib/i686/cmov/libc.so.6 >>>>> #2 0xb77ba83a in ?? () from /lib/i686/cmov/libresolv.so.2 >>>>> #3 0xb77b8946 in __libc_res_nquery () from >>>>> /lib/i686/cmov/libresolv.so.2 >>>>> #4 0xb77b8fdb in ?? () from /lib/i686/cmov/libresolv.so.2 >>>>> #5 0xb77b92ae in __libc_res_nsearch () from >>>>> /lib/i686/cmov/libresolv.so.2 >>>>> #6 0xb77b96d4 in __res_nsearch () from >>>>> >> /lib/i686/cmov/libresolv.so.2 >> >>>>> #7 0xb77b808a in res_search () from /lib/i686/cmov/libresolv.so.2 >>>>> #8 0x0808c613 in get_record () >>>>> #9 0x0808cf05 in ?? () >>>>> #10 0x0808e385 in sip_resolvehost () >>>>> #11 0x0807a26c in mk_proxy () >>>>> #12 0xb7627d39 in t_relay_to () from >>>>> >> /usr/lib/opensips/modules/tm.so >> >>>>> #13 0xb7634501 in ?? () from /usr/lib/opensips/modules/tm.so >>>>> #14 0x08055030 in do_action () >>>>> #15 0x08053ebf in run_action_list () >>>>> #16 0x08095cf2 in eval_expr () >>>>> #17 0x080958d9 in eval_expr () >>>>> #18 0x08095919 in eval_expr () >>>>> #19 0x080554e2 in do_action () >>>>> #20 0x08053ebf in run_action_list () >>>>> #21 0x08056e7a in do_action () >>>>> #22 0x08053ebf in run_action_list () >>>>> ---Type <return> to continue, or q <return> to quit--- >>>>> #23 0x080569d8 in do_action () >>>>> #24 0x08053ebf in run_action_list () >>>>> #25 0x08056e7a in do_action () >>>>> #26 0x08053ebf in run_action_list () >>>>> #27 0x08057d99 in run_top_route () >>>>> #28 0x0808ad6c in receive_msg () >>>>> #29 0x080bd2f2 in udp_rcv_loop () >>>>> #30 0x08069339 in main () >>>>> (gdb) >>>>> >>>>> -- >>>>> Best Regards, >>>>> Alex Massover >>>>> VoIP R&D TL >>>>> Jajah Inc. >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: users-boun...@lists.opensips.org [mailto:users- >>>>>> boun...@lists.opensips.org] On Behalf Of Andrei Dragus >>>>>> Sent: Wednesday, January 20, 2010 2:58 PM >>>>>> To: OpenSIPS users mailling list >>>>>> Subject: Re: [OpenSIPS-Users] sched_yield() >>>>>> >>>>>> Hi, >>>>>> >>>>>> I think that there is a lock that is being held more than it >>>>>> >> should >> >>>>> be >>>>> >>>>> >>>>>> and that's what causes starvation. It would help us if you could >>>>>> >>>>>> >>>>> attach >>>>> >>>>> >>>>>> to a process using gdb and give us a full backtrace. >>>>>> >>>>>> Temporary solutions which should work would be to reduce the >>>>>> >> number >> >>>>> of >>>>> >>>>> >>>>>> processes to 4-6 or to recompile replacing -DFAST_LOCK with one of >>>>>> >>>>>> >>>>> the >>>>> >>>>> >>>>>> other options (-DUSE_POSIX_SEM or -DUSE_PTHREAD_MUTEX) but we >>>>>> >>>>>> >>>> should >>>> >>>> >>>>>> see >>>>>> where this is from to fix it. >>>>>> >>>>>> Alex Massover wrote: >>>>>> >>>>>> >>>>>>> Hi! >>>>>>> >>>>>>> Yes, from the source on debian, I build deb package. (I did some >>>>>>> >>>>>>> >>>>>> minor changes to the source, but the problem happens also without >>>>>> >>>>>> >>>> my >>>> >>>> >>>>>> changes) >>>>>> >>>>>> >>>>>>> 16 children on 4 cores. >>>>>>> >>>>>>> What do you suggest to reduce it to 4? It runs on 2.6.32 on >>>>>>> >>>>>>> >>>> VMware >>>> >>>> >>>>>> ESX. >>>>>> >>>>>> >>>>>>> I'm also trying now sleep(0) instead of sched_yield(). >>>>>>> >>>>>>> -- >>>>>>> Best Regards, >>>>>>> Alex Massover >>>>>>> VoIP R&D TL >>>>>>> Jajah Inc. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: users-boun...@lists.opensips.org [mailto:users- >>>>>>>> boun...@lists.opensips.org] On Behalf Of Andrei Dragus >>>>>>>> Sent: Wednesday, January 20, 2010 1:05 PM >>>>>>>> To: OpenSIPS users mailling list >>>>>>>> Subject: Re: [OpenSIPS-Users] sched_yield() >>>>>>>> >>>>>>>> Hi Alex, >>>>>>>> >>>>>>>> Are you building OpenSIPS from source? >>>>>>>> How many processes do you have and on how many cores? >>>>>>>> >>>>>>>> >>>>>>>> Alex Massover wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Hello! >>>>>>>>> >>>>>>>>> I'm facing a strange problem, sometimes under a stress OpenSIPS >>>>>>>>> "locks" - load average jumps, SIP processing delays, opensips >>>>>>>>> >>>>>>>>> >>>> msg >>>> >>>> >>>>>>>>> queue fills with a lot of sip messages, opensips processes >>>>>>>>> >>>>>>>>> >>>> start >>>> >>>> >>>>> to >>>>> >>>>> >>>>>>>>> comsume a lot of CPU. >>>>>>>>> >>>>>>>>> And strace shows: >>>>>>>>> >>>>>>>>> sched_yield() >>>>>>>>> >>>>>>>>> sched_yield() >>>>>>>>> >>>>>>>>> sched_yield() >>>>>>>>> >>>>>>>>> sched_yield() >>>>>>>>> >>>>>>>>> .... >>>>>>>>> >>>>>>>>> for all processes. >>>>>>>>> >>>>>>>>> If I stop the stress - after a while (not immediately) - it >>>>>>>>> >>>>>>>>> >>>>>> unlocks, >>>>>> >>>>>> >>>>>>>>> also suddenly, I can see in top that all opensips processes >>>>>>>>> >>>>>>>>> >>>> stop >>>> >>>> >>>>> to >>>>> >>>>> >>>>>>>>> consume CPU. >>>>>>>>> >>>>>>>>> What can it be? Some kind of starvation? >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> >>>>>>>>> Alex Massover >>>>>>>>> >>>>>>>>> VoIP R&D TL >>>>>>>>> >>>>>>>>> Jajah Inc. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> This mail was sent via Mail-SeCure System. >>>>>>>>> --------------------------------------------------------------- >>>>>>>>> >>>>>>>>> >>>> -- >>>> >>>> >>>>> -- >>>>> >>>>> >>>>>> -- >>>>>> >>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Users mailing list >>>>>>>>> Users@lists.opensips.org >>>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> -- >>>>>>>> Andrei Dragus >>>>>>>> www.voice-system.ro >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list >>>>>>>> Users@lists.opensips.org >>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users >>>>>>>> >>>>>>>> This mail was received via Mail-SeCure System. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> This mail was sent via Mail-SeCure System. >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Users mailing list >>>>>>> Users@lists.opensips.org >>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users >>>>>>> >>>>>>> >>>>>>> >>>>>> -- >>>>>> Andrei Dragus >>>>>> www.voice-system.ro >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list >>>>>> Users@lists.opensips.org >>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users >>>>>> >>>>>> This mail was received via Mail-SeCure System. >>>>>> >>>>>> >>>>>> >>>>> This mail was sent via Mail-SeCure System. >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@lists.opensips.org >>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users >>>>> >>>>> This mail was received via Mail-SeCure System. >>>>> >>>>> >>>>> >>>> This mail was sent via Mail-SeCure System. >>>> >>>> >>>> >>>> _______________________________________________ >>>> Users mailing list >>>> Users@lists.opensips.org >>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users >>>> >>>> This mail was received via Mail-SeCure System. >>>> >>>> >>>> >>> This mail was sent via Mail-SeCure System. >>> >>> >>> >>> _______________________________________________ >>> Users mailing list >>> Users@lists.opensips.org >>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users >>> >>> >> -- >> Andrei Dragus >> www.voice-system.ro >> >> >> _______________________________________________ >> Users mailing list >> Users@lists.opensips.org >> http://lists.opensips.org/cgi-bin/mailman/listinfo/users >> >> This mail was received via Mail-SeCure System. >> >> > > > This mail was sent via Mail-SeCure System. > > > > _______________________________________________ > Users mailing list > Users@lists.opensips.org > http://lists.opensips.org/cgi-bin/mailman/listinfo/users > -- Andrei Dragus www.voice-system.ro _______________________________________________ Users mailing list Users@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/users