Hi, I also think that locking->busy wait->sched_yield() is problematic by itself. I'm not a scheduling specialist, but looks like this mechanism do not allow utilizing the full power of modern n-way SMP machines.
Actually I did the experiment with pthread mutexes, Andrei also suggested it. But the performance was even worse than assembler locks. -- Best Regards, Alex Massover VoIP R&D TL Jajah Inc. > -----Original Message----- > From: users-boun...@lists.opensips.org [mailto:users- > boun...@lists.opensips.org] On Behalf Of Bogdan-Andrei Iancu > Sent: Monday, January 25, 2010 4:03 PM > To: OpenSIPS users mailling list > Subject: Re: [OpenSIPS-Users] sched_yield() > > Hi Alex, > > A wild guess is that the bootleneck was not actually because of the > opensips memory manager, but because of the locking system - the > default > locking system is using a user-space locking based on tricks in > assembler - like synchronization at mem cell location (volatile > variables). > So, I would guess the locking is the one killing the memory (on VM) and > you noticed only the side effect - the memory manager where locking is > very intensively used. Once you removed the need of sync for mem (with > one proc), the system started to act normally. > > An interesting experiment will be to set back the mem balloon on VM and > change the locking implementation (use > -DUSE_PTHREAD_MUTEX in Makefile.defs ) > > Regards, > Bogdan > > > Alex Massover wrote: > > Hi! > > > > I use dialog to store/retrieve variables, but without profiling. > > > > Looks like I found the problem - VMware has a memory balloon, it > allows overcommiting physical memory to virtual machines (provided that > not all guests need all the memory all the time). Usually it behaves > OK, but has a dramatic performance effect on OpenSIPS. Probably the > memory balloon is aware of how system memory management works but > unaware of OpenSIPS internal memory manager. > > > > After removing memory balloon driver no hangs anymore with 4/8 > children. But single working child worked well even before removing the > balloon (and I'm on 4-way SMP)! > > > > Looks like there's no rule how many children to configure, it depends > on modules in-use, memory speed, cpu speed and so on. Only stress test > for each concrete system gives an answer. > > > > Hope the new architecture will take care of such issues as well :) > > > > -- > > Best Regards, > > Alex Massover > > VoIP R&D TL > > Jajah Inc. > > > >> -----Original Message----- > >> From: users-boun...@lists.opensips.org [mailto:users- > >> boun...@lists.opensips.org] On Behalf Of Bogdan-Andrei Iancu > >> Sent: Friday, January 22, 2010 7:43 PM > >> To: OpenSIPS users mailling list > >> Subject: Re: [OpenSIPS-Users] sched_yield() > >> > >> Hi Alex, > >> > >> Bug was fixed - update from SVN. > >> > >> Regarding your observation on"forking" versus "no-forking" - in some > >> cases (when not doing any blocking ops), a single proc may be faster > >> that multiple procs on a single core machine - because the CPU power > is > >> the same and maximum used (no blocking), but in forking mode you > have > >> the overhead of proc switching and the loocking/synchronizing dead- > >> times. > >> > >> Regards, > >> Bogdan > >> > >> Alex Massover wrote: > >> > >>> Hi, > >>> > >>> Unfortunately 'fifo get_statistics' crashes opensips, I opened a > bug. > >>> But no chance that 1G is not enough, only about 400M is used for > all > >>> > >> linux processes: > >> > >>> Mem: 3115120k total, 398360k used, 2716760k free, 536k > >>> > >> buffers > >> > >>> Maybe sched_yield() just cause problems on 2.3.62 or on vmware or > on > >>> > >> SMP? > >> > >>> I'm trying now with fork=yes and children=1. > >>> If I have only one working child, does it suppose to lock and > >>> > >> shed_yeild() itself from any reason? > >> > >>> Meanwhile with single child OpenSIPS easily handles 4K of > concurrent > >>> > >> calls at 15cps, load average is 0.00 (!) and CPU is about 96% idle. > >> > >>> I wonder if single working child also hangs. > >>> > >>> -- > >>> Best Regards, > >>> Alex Massover > >>> VoIP R&D TL > >>> Jajah Inc. > >>> > >>> > >>>> -----Original Message----- > >>>> From: users-boun...@lists.opensips.org [mailto:users- > >>>> boun...@lists.opensips.org] On Behalf Of Andrei Dragus > >>>> Sent: Friday, January 22, 2010 1:17 PM > >>>> To: OpenSIPS users mailling list > >>>> Subject: Re: [OpenSIPS-Users] sched_yield() > >>>> > >>>> Hi, > >>>> > >>>> The new f_malloc will not do anything extra when compared to the > old > >>>> one > >>>> until memory usage goes way up. > >>>> I've added a warning in mem/f_malloc.c so you can see when defrag > >>>> starts. If you get this warning then it is clear that the problem > is > >>>> from high memory usage. > >>>> > >>>> 1 GB for 4k calls seems a lot ( 250k per call). You can try to use > >>>> "opensipsctl fifo get_statistics shmem:" and see what the memory > >>>> > >> usage > >> > >>>> is for diferent number of concurrent calls ( 1k,2k,3k,4k), and if > >>>> indeed > >>>> the memory usage is that high we should investigate the cause. > >>>> > >>>> > >>>> Alex Massover wrote: > >>>> > >>>> > >>>>> Hi, > >>>>> > >>>>> Now shared memory is 1G (-m 1024), and all memory is dedicated > to > >>>>> > >>>>> > >>>> the virtual machine (it was shared till now). > >>>> > >>>> > >>>>> But it still happens, just not so often. > >>>>> > >>>>> I originate the calls for this stress test in Asterisk with the > >>>>> > >> same > >> > >>>> resources and looks like Asterisk performs much better than > >>>> > >> OpenSIPS. > >> > >>>> How can it be? > >>>> > >>>> > >>>>> In my stress OpenSIPS does no blocking/slow requests. And it's > just > >>>>> > >>>>> > >>>> 4K concurrent calls, each one is 2-3 min. > >>>> > >>>> > >>>>> Maybe OpenSIPS does too much low level memory management and > >>>>> > >> virtual > >> > >>>> machine is not suitable for it (despite that Asterisk runs well > over > >>>> VMware)? > >>>> > >>>> > >>>>> I'm not sure but I have a feeling that 1.4 performed better. What > >>>>> > >> can > >> > >>>> cause performance degradation in 1.6? Storing vars on dialog, new > >>>> malloc()? > >>>> > >>>> > >>>>> gdb) bt > >>>>> #0 0xb78ad424 in __kernel_vsyscall () > >>>>> #1 0xb77e841c in sched_yield () from /lib/i686/cmov/libc.so.6 > >>>>> #2 0x080bf23d in new_avp () > >>>>> #3 0x080bf53f in add_avp () > >>>>> #4 0x08080e6e in pv_set_avp () > >>>>> #5 0x0808229c in pv_set_value () > >>>>> #6 0x08053c9d in do_assign () > >>>>> #7 0x0805447a in do_action () > >>>>> #8 0x08053ebf in run_action_list () > >>>>> #9 0x08056e7a in do_action () > >>>>> #10 0x08053ebf in run_action_list () > >>>>> #11 0x08056e7a in do_action () > >>>>> #12 0x08053ebf in run_action_list () > >>>>> #13 0x080569d8 in do_action () > >>>>> #14 0x08053ebf in run_action_list () > >>>>> #15 0x08056e7a in do_action () > >>>>> #16 0x08053ebf in run_action_list () > >>>>> #17 0x08057d99 in run_top_route () > >>>>> #18 0x0808ad6c in receive_msg () > >>>>> #19 0x080bd2f2 in udp_rcv_loop () > >>>>> #20 0x08069339 in main () > >>>>> (gdb) quit > >>>>> > >>>>> > >>>>> (gdb) bt > >>>>> #0 0xb78ad424 in __kernel_vsyscall () > >>>>> #1 0xb77e841c in sched_yield () from /lib/i686/cmov/libc.so.6 > >>>>> #2 0xb76f52cd in build_cell () from > >>>>> > >> /usr/lib/opensips/modules/tm.so > >> > >>>>> #3 0xb770ac4a in t_newtran () from > /usr/lib/opensips/modules/tm.so > >>>>> #4 0xb76ff7b8 in t_relay_to () from > >>>>> > >> /usr/lib/opensips/modules/tm.so > >> > >>>>> #5 0xb770c501 in ?? () from /usr/lib/opensips/modules/tm.so > >>>>> #6 0x08055030 in do_action () > >>>>> #7 0x08053ebf in run_action_list () > >>>>> #8 0x08095cf2 in eval_expr () > >>>>> #9 0x080958d9 in eval_expr () > >>>>> #10 0x08095919 in eval_expr () > >>>>> #11 0x080554e2 in do_action () > >>>>> #12 0x08053ebf in run_action_list () > >>>>> #13 0x080569d8 in do_action () > >>>>> #14 0x08053ebf in run_action_list () > >>>>> #15 0x08056e7a in do_action () > >>>>> #16 0x08053ebf in run_action_list () > >>>>> #17 0x08057d99 in run_top_route () > >>>>> #18 0x0808ad6c in receive_msg () > >>>>> #19 0x080bd2f2 in udp_rcv_loop () > >>>>> #20 0x08069339 in main () > >>>>> > >>>>> > >>>>> -- > >>>>> Best Regards, > >>>>> Alex Massover > >>>>> VoIP R&D TL > >>>>> Jajah Inc. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: users-boun...@lists.opensips.org [mailto:users- > >>>>>> boun...@lists.opensips.org] On Behalf Of Andrei Dragus > >>>>>> Sent: Thursday, January 21, 2010 3:43 PM > >>>>>> To: OpenSIPS users mailling list > >>>>>> Subject: Re: [OpenSIPS-Users] sched_yield() > >>>>>> > >>>>>> My guess is that there is not enough shared memory. When an > >>>>>> > >>>>>> > >>>> allocation > >>>> > >>>> > >>>>>> failes OpenSIPS tries to defragment memory to make room which > >>>>>> > >> takes > >> > >>>> a > >>>> > >>>> > >>>>>> lot of time and must be done under lock. > >>>>>> > >>>>>> Please try to increase the shared memory size and tell me if it > >>>>>> persists. > >>>>>> > >>>>>> > >>>>>> Alex Massover wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Hi! > >>>>>>> > >>>>>>> Yes, with -DF_MALLOC. > >>>>>>> > >>>>>>> 1.6.1 from sources, I build deb package. > >>>>>>> I use 128M of shared and 10*1024*1024 private memory (can > >>>>>>> > >> increase > >> > >>>> - > >>>> > >>>> > >>>>>> no problem). > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Hmmmm, "opensipsctl fifo get_statistics all" crashes/stops the > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> opensips. > >>>>>> > >>>>>> > >>>>>> > >>>>>>> 'fifo uptime' or 'fifo debug' are OK. > >>>>>>> > >>>>>>> strace while 'fifo get_statistics all': > >>>>>>> Process 9509 attached - interrupt to quit > >>>>>>> pause() = ? ERESTARTNOHAND (To > be > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> restarted) > >>>>>> > >>>>>> > >>>>>> > >>>>>>> --- SIGUSR2 (User defined signal 2) @ 0 (0) --- > >>>>>>> sigreturn() = ? (mask now []) > >>>>>>> pause() = ? ERESTARTNOHAND (To > be > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> restarted) > >>>>>> > >>>>>> > >>>>>> > >>>>>>> --- SIGCHLD (Child exited) @ 0 (0) --- > >>>>>>> sigreturn() = ? (mask now []) > >>>>>>> waitpid(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGUSR2}], > >>>>>>> > >> WNOHANG) > >> > >>>> = > >>>> > >>>> > >>>>>> 9520 > >>>>>> > >>>>>> > >>>>>> > >>>>>>> waitpid(-1, 0xbf84b4c8, WNOHANG) = 0 > >>>>>>> kill(0, SIGTERM) = 0 > >>>>>>> --- SIGTERM (Terminated) @ 0 (0) --- > >>>>>>> --- SIGCHLD (Child exited) @ 0 (0) --- > >>>>>>> sigreturn() = ? (mask now [TERM]) > >>>>>>> sigreturn() = ? (mask now []) > >>>>>>> rt_sigaction(SIGALRM, {0x8065920, [ALRM], SA_RESTART}, > {SIG_DFL}, > >>>>>>> > >>>>>>> > >>>> 8) > >>>> > >>>> > >>>>>> = 0 > >>>>>> > >>>>>> > >>>>>> > >>>>>>> alarm(60) = 0 > >>>>>>> wait4(-1, NULL, 0, NULL) = 9514 > >>>>>>> wait4(-1, NULL, 0, NULL) = 9519 > >>>>>>> wait4(-1, NULL, 0, NULL) = 9521 > >>>>>>> wait4(-1, NULL, 0, NULL) = 9522 > >>>>>>> wait4(-1, NULL, 0, NULL) = 9512 > >>>>>>> --- SIGCHLD (Child exited) @ 0 (0) --- > >>>>>>> sigreturn() = ? (mask now []) > >>>>>>> --- SIGCHLD (Child exited) @ 0 (0) --- > >>>>>>> sigreturn() = ? (mask now []) > >>>>>>> wait4(-1, NULL, 0, NULL) = 9510 > >>>>>>> wait4(-1, NULL, 0, NULL) = 9516 > >>>>>>> --- SIGCHLD (Child exited) @ 0 (0) --- > >>>>>>> sigreturn() = ? (mask now []) > >>>>>>> --- SIGCHLD (Child exited) @ 0 (0) --- > >>>>>>> sigreturn() = ? (mask now []) > >>>>>>> wait4(-1, NULL, 0, NULL) = 9515 > >>>>>>> wait4(-1, NULL, 0, NULL) = 9517 > >>>>>>> wait4(-1, NULL, 0, NULL) = 9524 > >>>>>>> wait4(-1, NULL, 0, NULL) = 9525 > >>>>>>> --- SIGCHLD (Child exited) @ 0 (0) --- > >>>>>>> sigreturn() = ? (mask now []) > >>>>>>> --- SIGCHLD (Child exited) @ 0 (0) --- > >>>>>>> sigreturn() = ? (mask now []) > >>>>>>> --- SIGCHLD (Child exited) @ 0 (0) --- > >>>>>>> sigreturn() = ? (mask now []) > >>>>>>> --- SIGCHLD (Child exited) @ 0 (0) --- > >>>>>>> sigreturn() = ? (mask now []) > >>>>>>> wait4(-1, NULL, 0, NULL) = 9511 > >>>>>>> wait4(-1, NULL, 0, NULL) = 9513 > >>>>>>> wait4(-1, NULL, 0, NULL) = 9518 > >>>>>>> wait4(-1, NULL, 0, NULL) = 9523 > >>>>>>> wait4(-1, NULL, 0, NULL) = -1 ECHILD (No child > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> processes) > >>>>>> > >>>>>> > >>>>>> > >>>>>>> rt_sigaction(SIGALRM, {0x8066080, [ALRM], SA_RESTART}, > >>>>>>> > >> {0x8065920, > >> > >>>>>>> > >>>>>> [ALRM], SA_RESTART}, 8) = 0 > >>>>>> > >>>>>> > >>>>>> > >>>>>>> stat64("/tmp/opensips_fifo", {st_mode=S_IFIFO|0660, st_size=0, > >>>>>>> > >>>>>>> > >>>> ...}) > >>>> > >>>> > >>>>>> = 0 > >>>>>> > >>>>>> > >>>>>> > >>>>>>> unlink("/tmp/opensips_fifo") = 0 > >>>>>>> munmap(0xaed25000, 134217728) = 0 > >>>>>>> unlink("/var/run/opensips/opensips.pid") = 0 > >>>>>>> alarm(0) = 60 > >>>>>>> rt_sigaction(SIGALRM, {SIG_IGN}, {0x8066080, [ALRM], > SA_RESTART}, > >>>>>>> > >>>>>>> > >>>> 8) > >>>> > >>>> > >>>>>> = 0 > >>>>>> > >>>>>> > >>>>>> > >>>>>>> exit_group(0) = ? > >>>>>>> Process 9509 detached > >>>>>>> > >>>>>>> -- > >>>>>>> Best Regards, > >>>>>>> Alex Massover > >>>>>>> VoIP R&D TL > >>>>>>> Jajah Inc. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: users-boun...@lists.opensips.org [mailto:users- > >>>>>>>> boun...@lists.opensips.org] On Behalf Of Andrei Dragus > >>>>>>>> Sent: Thursday, January 21, 2010 3:09 PM > >>>>>>>> To: OpenSIPS users mailling list > >>>>>>>> Subject: Re: [OpenSIPS-Users] sched_yield() > >>>>>>>> > >>>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> Since all the backtraces are in allocation routines my guess > is > >>>>>>>> > >>>>>>>> > >>>> that > >>>> > >>>> > >>>>>>>> the > >>>>>>>> shared memory lock might be causing a problem. > >>>>>>>> > >>>>>>>> Are you compiling with -DF_MALLOC? > >>>>>>>> What version of OpenSIPS are you using? > >>>>>>>> What is the total shared memory pool you are allocating? > >>>>>>>> What amount of memory are you using? ( Use : opensipsctl fifo > >>>>>>>> get_statistics all ) > >>>>>>>> > >>>>>>>> Alex Massover wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> Some more, > >>>>>>>>> > >>>>>>>>> (gdb) bt > >>>>>>>>> #0 0xb78dc424 in __kernel_vsyscall () > >>>>>>>>> #1 0xb781741c in sched_yield () from > /lib/i686/cmov/libc.so.6 > >>>>>>>>> #2 0xb73d77fd in build_new_dlg () from > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> /usr/lib/opensips/modules/dialog.so > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> #3 0xb73d4b81 in dlg_create_dialog () from > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> /usr/lib/opensips/modules/dialog.so > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> #4 0xb73c9c9e in ?? () from > >>>>>>>>> > >> /usr/lib/opensips/modules/dialog.so > >> > >>>>>>>>> #5 0x08055030 in do_action () > >>>>>>>>> #6 0x08053ebf in run_action_list () > >>>>>>>>> #7 0x08056e7a in do_action () > >>>>>>>>> #8 0x08053ebf in run_action_list () > >>>>>>>>> #9 0x08057d99 in run_top_route () > >>>>>>>>> #10 0x0808ad6c in receive_msg () > >>>>>>>>> #11 0x080bd2f2 in udp_rcv_loop () > >>>>>>>>> #12 0x08069339 in main () > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> (gdb) bt > >>>>>>>>> #0 0xb78dc424 in __kernel_vsyscall () > >>>>>>>>> #1 0xb781741c in sched_yield () from > /lib/i686/cmov/libc.so.6 > >>>>>>>>> #2 0xb77242cd in build_cell () from > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>> /usr/lib/opensips/modules/tm.so > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>> #3 0xb7739c4a in t_newtran () from > >>>>>>>>> > >>>>>>>>> > >>>> /usr/lib/opensips/modules/tm.so > >>>> > >>>> > >>>>>>>>> #4 0xb772e7b8 in t_relay_to () from > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>> /usr/lib/opensips/modules/tm.so > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>> #5 0xb773b501 in ?? () from /usr/lib/opensips/modules/tm.so > >>>>>>>>> #6 0x08055030 in do_action () > >>>>>>>>> #7 0x08053ebf in run_action_list () > >>>>>>>>> #8 0x08095cf2 in eval_expr () > >>>>>>>>> #9 0x080958d9 in eval_expr () > >>>>>>>>> #10 0x08095919 in eval_expr () > >>>>>>>>> #11 0x080554e2 in do_action () > >>>>>>>>> #12 0x08053ebf in run_action_list () > >>>>>>>>> #13 0x080569d8 in do_action () > >>>>>>>>> #14 0x08053ebf in run_action_list () > >>>>>>>>> #15 0x08056e7a in do_action () > >>>>>>>>> #16 0x08053ebf in run_action_list () > >>>>>>>>> #17 0x08057d99 in run_top_route () > >>>>>>>>> #18 0x0808ad6c in receive_msg () > >>>>>>>>> #19 0x080bd2f2 in udp_rcv_loop () > >>>>>>>>> #20 0x08069339 in main () > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Best Regards, > >>>>>>>>> Alex Massover > >>>>>>>>> VoIP R&D TL > >>>>>>>>> Jajah Inc. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> -----Original Message----- > >>>>>>>>>> From: users-boun...@lists.opensips.org [mailto:users- > >>>>>>>>>> boun...@lists.opensips.org] On Behalf Of Alex Massover > >>>>>>>>>> Sent: Thursday, January 21, 2010 2:24 PM > >>>>>>>>>> To: OpenSIPS users mailling list > >>>>>>>>>> Subject: Re: [OpenSIPS-Users] sched_yield() > >>>>>>>>>> > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> Another one.. It hangs for a number of seconds (but it's > >>>>>>>>>> > >> enough > >> > >>>> to > >>>> > >>>> > >>>>>>>>>> cause to SIP timeouts - MSG queue jumps to 260K), it's hard > to > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>> make > >>>>>> > >>>>>> > >>>>>> > >>>>>>>> a > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>> bt at the right moment. > >>>>>>>>>> This one looks better because there's sched_yield() there :) > >>>>>>>>>> > >>>>>>>>>> (gdb) bt > >>>>>>>>>> #0 0xb77d5424 in __kernel_vsyscall () > >>>>>>>>>> #1 0xb771041c in sched_yield () from > /lib/i686/cmov/libc.so.6 > >>>>>>>>>> #2 0x080bf23d in new_avp () > >>>>>>>>>> #3 0x080bf53f in add_avp () > >>>>>>>>>> #4 0xb72c1c9c in ?? () from > >>>>>>>>>> > >> /usr/lib/opensips/modules/dialog.so > >> > >>>>>>>>>> #5 0x08055030 in do_action () > >>>>>>>>>> #6 0x08053ebf in run_action_list () > >>>>>>>>>> #7 0x08056e7a in do_action () > >>>>>>>>>> #8 0x08053ebf in run_action_list () > >>>>>>>>>> #9 0x08056e7a in do_action () > >>>>>>>>>> #10 0x08053ebf in run_action_list () > >>>>>>>>>> #11 0x08056e7a in do_action () > >>>>>>>>>> #12 0x08053ebf in run_action_list () > >>>>>>>>>> #13 0x08057d99 in run_top_route () > >>>>>>>>>> #14 0x0808ad6c in receive_msg () > >>>>>>>>>> #15 0x080bd2f2 in udp_rcv_loop () > >>>>>>>>>> #16 0x08069339 in main () > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> Best Regards, > >>>>>>>>>> Alex Massover > >>>>>>>>>> VoIP R&D TL > >>>>>>>>>> Jajah Inc. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> -----Original Message----- > >>>>>>>>>>> From: users-boun...@lists.opensips.org [mailto:users- > >>>>>>>>>>> boun...@lists.opensips.org] On Behalf Of Alex Massover > >>>>>>>>>>> Sent: Thursday, January 21, 2010 2:05 PM > >>>>>>>>>>> To: OpenSIPS users mailling list > >>>>>>>>>>> Subject: Re: [OpenSIPS-Users] sched_yield() > >>>>>>>>>>> > >>>>>>>>>>> Hi Andrei, > >>>>>>>>>>> Hopefully this is it (with FASTLOCK) > >>>>>>>>>>> > >>>>>>>>>>> #0 0xb77d5424 in __kernel_vsyscall () > >>>>>>>>>>> #1 0xb772babb in poll () from /lib/i686/cmov/libc.so.6 > >>>>>>>>>>> #2 0xb77ba83a in ?? () from /lib/i686/cmov/libresolv.so.2 > >>>>>>>>>>> #3 0xb77b8946 in __libc_res_nquery () from > >>>>>>>>>>> /lib/i686/cmov/libresolv.so.2 > >>>>>>>>>>> #4 0xb77b8fdb in ?? () from /lib/i686/cmov/libresolv.so.2 > >>>>>>>>>>> #5 0xb77b92ae in __libc_res_nsearch () from > >>>>>>>>>>> /lib/i686/cmov/libresolv.so.2 > >>>>>>>>>>> #6 0xb77b96d4 in __res_nsearch () from > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> /lib/i686/cmov/libresolv.so.2 > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>> #7 0xb77b808a in res_search () from > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>> /lib/i686/cmov/libresolv.so.2 > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>>>> #8 0x0808c613 in get_record () > >>>>>>>>>>> #9 0x0808cf05 in ?? () > >>>>>>>>>>> #10 0x0808e385 in sip_resolvehost () > >>>>>>>>>>> #11 0x0807a26c in mk_proxy () > >>>>>>>>>>> #12 0xb7627d39 in t_relay_to () from > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> /usr/lib/opensips/modules/tm.so > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>> #13 0xb7634501 in ?? () from > /usr/lib/opensips/modules/tm.so > >>>>>>>>>>> #14 0x08055030 in do_action () > >>>>>>>>>>> #15 0x08053ebf in run_action_list () > >>>>>>>>>>> #16 0x08095cf2 in eval_expr () > >>>>>>>>>>> #17 0x080958d9 in eval_expr () > >>>>>>>>>>> #18 0x08095919 in eval_expr () > >>>>>>>>>>> #19 0x080554e2 in do_action () > >>>>>>>>>>> #20 0x08053ebf in run_action_list () > >>>>>>>>>>> #21 0x08056e7a in do_action () > >>>>>>>>>>> #22 0x08053ebf in run_action_list () > >>>>>>>>>>> ---Type <return> to continue, or q <return> to quit--- > >>>>>>>>>>> #23 0x080569d8 in do_action () > >>>>>>>>>>> #24 0x08053ebf in run_action_list () > >>>>>>>>>>> #25 0x08056e7a in do_action () > >>>>>>>>>>> #26 0x08053ebf in run_action_list () > >>>>>>>>>>> #27 0x08057d99 in run_top_route () > >>>>>>>>>>> #28 0x0808ad6c in receive_msg () > >>>>>>>>>>> #29 0x080bd2f2 in udp_rcv_loop () > >>>>>>>>>>> #30 0x08069339 in main () > >>>>>>>>>>> (gdb) > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> Best Regards, > >>>>>>>>>>> Alex Massover > >>>>>>>>>>> VoIP R&D TL > >>>>>>>>>>> Jajah Inc. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> -----Original Message----- > >>>>>>>>>>>> From: users-boun...@lists.opensips.org [mailto:users- > >>>>>>>>>>>> boun...@lists.opensips.org] On Behalf Of Andrei Dragus > >>>>>>>>>>>> Sent: Wednesday, January 20, 2010 2:58 PM > >>>>>>>>>>>> To: OpenSIPS users mailling list > >>>>>>>>>>>> Subject: Re: [OpenSIPS-Users] sched_yield() > >>>>>>>>>>>> > >>>>>>>>>>>> Hi, > >>>>>>>>>>>> > >>>>>>>>>>>> I think that there is a lock that is being held more than > it > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>> should > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>> be > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> and that's what causes starvation. It would help us if you > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>> could > >>>> > >>>> > >>>>>>>>>>>> > >>>>>>>>>>> attach > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> to a process using gdb and give us a full backtrace. > >>>>>>>>>>>> > >>>>>>>>>>>> Temporary solutions which should work would be to reduce > the > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>> number > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>> of > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> processes to 4-6 or to recompile replacing -DFAST_LOCK > with > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>> one > >>>> > >>>> > >>>>>> of > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>>>> the > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> other options (-DUSE_POSIX_SEM or -DUSE_PTHREAD_MUTEX) but > >>>>>>>>>>>> > >> we > >> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> should > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>> see > >>>>>>>>>>>> where this is from to fix it. > >>>>>>>>>>>> > >>>>>>>>>>>> Alex Massover wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> Hi! > >>>>>>>>>>>>> > >>>>>>>>>>>>> Yes, from the source on debian, I build deb package. (I > did > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>> some > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>>>>> minor changes to the source, but the problem happens also > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>> without > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>>> my > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>> changes) > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> 16 children on 4 cores. > >>>>>>>>>>>>> > >>>>>>>>>>>>> What do you suggest to reduce it to 4? It runs on 2.6.32 > on > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>> VMware > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>> ESX. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> I'm also trying now sleep(0) instead of sched_yield(). > >>>>>>>>>>>>> > >>>>>>>>>>>>> -- > >>>>>>>>>>>>> Best Regards, > >>>>>>>>>>>>> Alex Massover > >>>>>>>>>>>>> VoIP R&D TL > >>>>>>>>>>>>> Jajah Inc. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> -----Original Message----- > >>>>>>>>>>>>>> From: users-boun...@lists.opensips.org [mailto:users- > >>>>>>>>>>>>>> boun...@lists.opensips.org] On Behalf Of Andrei Dragus > >>>>>>>>>>>>>> Sent: Wednesday, January 20, 2010 1:05 PM > >>>>>>>>>>>>>> To: OpenSIPS users mailling list > >>>>>>>>>>>>>> Subject: Re: [OpenSIPS-Users] sched_yield() > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi Alex, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Are you building OpenSIPS from source? > >>>>>>>>>>>>>> How many processes do you have and on how many cores? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Alex Massover wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hello! > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I'm facing a strange problem, sometimes under a stress > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>> OpenSIPS > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>>>>>>>> "locks" - load average jumps, SIP processing delays, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>> opensips > >>>> > >>>> > >>>>>>>>>>>>>>> > >>>>>>>>>> msg > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>>>>> queue fills with a lot of sip messages, opensips > >>>>>>>>>>>>>>> > >> processes > >> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>> start > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> to > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>>>> comsume a lot of CPU. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> And strace shows: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> sched_yield() > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> sched_yield() > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> sched_yield() > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> sched_yield() > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> .... > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> for all processes. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> If I stop the stress - after a while (not immediately) > - > >>>>>>>>>>>>>>> > >> it > >> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> unlocks, > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>>>> also suddenly, I can see in top that all opensips > >>>>>>>>>>>>>>> > >> processes > >> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>> stop > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> to > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>>>> consume CPU. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> What can it be? Some kind of starvation? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best Regards, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Alex Massover > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> VoIP R&D TL > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Jajah Inc. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> This mail was sent via Mail-SeCure System. > >>>>>>>>>>>>>>> ------------------------------------------------------- > -- > >>>>>>>>>>>>>>> > >> -- > >> > >>>> -- > >>>> > >>>> > >>>>>> -- > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>>> -- > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>>> --- > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>>>>> Users mailing list > >>>>>>>>>>>>>>> Users@lists.opensips.org > >>>>>>>>>>>>>>> http://lists.opensips.org/cgi- > bin/mailman/listinfo/users > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>> Andrei Dragus > >>>>>>>>>>>>>> www.voice-system.ro > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>>>> Users mailing list > >>>>>>>>>>>>>> Users@lists.opensips.org > >>>>>>>>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> This mail was received via Mail-SeCure System. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> This mail was sent via Mail-SeCure System. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>>> Users mailing list > >>>>>>>>>>>>> Users@lists.opensips.org > >>>>>>>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> Andrei Dragus > >>>>>>>>>>>> www.voice-system.ro > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>> Users mailing list > >>>>>>>>>>>> Users@lists.opensips.org > >>>>>>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users > >>>>>>>>>>>> > >>>>>>>>>>>> This mail was received via Mail-SeCure System. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> This mail was sent via Mail-SeCure System. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> _______________________________________________ > >>>>>>>>>>> Users mailing list > >>>>>>>>>>> Users@lists.opensips.org > >>>>>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users > >>>>>>>>>>> > >>>>>>>>>>> This mail was received via Mail-SeCure System. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> This mail was sent via Mail-SeCure System. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> Users mailing list > >>>>>>>>>> Users@lists.opensips.org > >>>>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users > >>>>>>>>>> > >>>>>>>>>> This mail was received via Mail-SeCure System. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> This mail was sent via Mail-SeCure System. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> _______________________________________________ > >>>>>>>>> Users mailing list > >>>>>>>>> Users@lists.opensips.org > >>>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> -- > >>>>>>>> Andrei Dragus > >>>>>>>> www.voice-system.ro > >>>>>>>> > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> Users mailing list > >>>>>>>> Users@lists.opensips.org > >>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users > >>>>>>>> > >>>>>>>> This mail was received via Mail-SeCure System. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> This mail was sent via Mail-SeCure System. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Users mailing list > >>>>>>> Users@lists.opensips.org > >>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> -- > >>>>>> Andrei Dragus > >>>>>> www.voice-system.ro > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Users mailing list > >>>>>> Users@lists.opensips.org > >>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users > >>>>>> > >>>>>> This mail was received via Mail-SeCure System. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> This mail was sent via Mail-SeCure System. > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Users mailing list > >>>>> Users@lists.opensips.org > >>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users > >>>>> > >>>>> > >>>>> > >>>> -- > >>>> Andrei Dragus > >>>> www.voice-system.ro > >>>> > >>>> > >>>> _______________________________________________ > >>>> Users mailing list > >>>> Users@lists.opensips.org > >>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users > >>>> > >>>> This mail was received via Mail-SeCure System. > >>>> > >>>> > >>>> > >>> This mail was sent via Mail-SeCure System. > >>> > >>> > >>> > >>> _______________________________________________ > >>> Users mailing list > >>> Users@lists.opensips.org > >>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users > >>> > >>> > >>> > >> -- > >> Bogdan-Andrei Iancu > >> www.voice-system.ro > >> > >> > >> _______________________________________________ > >> Users mailing list > >> Users@lists.opensips.org > >> http://lists.opensips.org/cgi-bin/mailman/listinfo/users > >> > >> This mail was received via Mail-SeCure System. > >> > >> > > > > > > This mail was sent via Mail-SeCure System. > > > > > > > > _______________________________________________ > > Users mailing list > > Users@lists.opensips.org > > http://lists.opensips.org/cgi-bin/mailman/listinfo/users > > > > > > > -- > Bogdan-Andrei Iancu > www.voice-system.ro > > > _______________________________________________ > Users mailing list > Users@lists.opensips.org > http://lists.opensips.org/cgi-bin/mailman/listinfo/users > > This mail was received via Mail-SeCure System. > This mail was sent via Mail-SeCure System. _______________________________________________ Users mailing list Users@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/users