Hi Kennard, You are perfectly right - there was a change in moving the setT(cell) upper in the function (we needed it in the callbacks), and this bug looks like a side effect.
I will make the fix asap ! Thanks and regards, Bogdan kennard_wh...@logitech.com wrote: > > Hi Bogdan, > > I believe I found the problem. When sip_msg_cloner() within > build_cell() fails due to out-of-mem, and dangling pointer to the cell > is left in the global transaction pointer. Later on the post_cb() code > attempts to clean this up, and "resurrects" the now-free memory, and > in particular puts it on a wait timer. My guess is that later on this > memory is allocated into a new transaction which eventually uses the > same wait timer link, and the two threads fight it out. > > Adding a set_t(0) fixes the problem. While looking, I believe I found > a memory leak in uac creation code that is also triggered by out of > memory conditions. Please see attached patch. > > The "problem" also manifest itself as a core dump with > insert_timer_unsafe. Once I disabled optimization, I saw multiple > variations within this function. The most common was that ptr==tl > after the search. The sequence of events for this is: new cell > created, stored into global T, free'd, added to wait-list by cleanup > code from global T, memory re-allocated into new cell, timer link > zero'd, and then added again to wait list. At least that is my best guess. > > Regards, > Kennard > > /(See attached file: opensips-tm-cell.patch)/ > > Inactive hide details for Bogdan-Andrei Iancu ---10/22/2010 07:02:36 > AM---Hi Kennard, I suppose the bt is the same ? do you > stiBogdan-Andrei Iancu ---10/22/2010 07:02:36 AM---Hi Kennard, I > suppose the bt is the same ? do you still have the core file ? > > From: Bogdan-Andrei Iancu <bog...@voice-system.ro> > To: OpenSIPS users mailling list <users@lists.opensips.org> > Date: 10/22/2010 07:02 AM > Subject: Re: [OpenSIPS-Users] opensips tm timer core dump > Sent by: users-boun...@lists.opensips.org > > ------------------------------------------------------------------------ > > > > Hi Kennard, > > I suppose the bt is the same ? do you still have the core file ? > > Regards, > Bogdan > > kennard_wh...@logitech.com wrote: > > > > Hi Bodgen, > > > > I replicated the error. Unfortunately the entire insert_timer_unsafe > > and been in-lined and little is available: > > > > Program terminated with signal 11, Segmentation fault. > > #0 0x00007f8b8356c2c2 in insert_timer_unsafe (new_tl=0x7f8b7a54e310, > > list_id=WT_TIMER_LIST, ext_timeout=<value optimized out>) at timer.c:731 > > 731 timer.c: No such file or directory. > > in timer.c > > (gdb) print tl > > $1 = <value optimized out> > > (gdb) print *tl > > Cannot access memory at address 0x0 > > (gdb) print ptr > > $2 = <value optimized out> > > (gdb) print *ptr > > Cannot access memory at address 0x0 > > (gdb) print *new_tl > > No symbol "new_tl" in current context. > > (gdb) up > > #1 set_1timer (new_tl=0x7f8b7a54e310, list_id=WT_TIMER_LIST, > > ext_timeout=<value optimized out>) at timer.c:904 > > 904 in timer.c > > (gdb) print *new_tl > > $3 = {next_tl = 0x0, prev_tl = 0x0, ld_tl = 0x0, time_out = 0, > > timer_list = 0x0, deleted = 0} > > (gdb) print list > > $4 = <value optimized out> > > (gdb) print timeout > > $5 = 32 > > (gdb) print new_tl > > $6 = (struct timer_link *) 0x7f8b7a54e310 > > > > I'll keep the core for a while -- please let me know if there is > > anything else I can try. > > > > Thanks, > > Kennard > > > > Inactive hide details for Bogdan-Andrei Iancu ---10/08/2010 04:40:47 > > AM---Hi Kennard, Ok, keep the core next time :)Bogdan-Andrei Iancu > > ---10/08/2010 04:40:47 AM---Hi Kennard, Ok, keep the core next time :) > > > > From: Bogdan-Andrei Iancu <bog...@voice-system.ro> > > To: OpenSIPS users mailling list <users@lists.opensips.org> > > Date: 10/08/2010 04:40 AM > > Subject: Re: [OpenSIPS-Users] opensips tm timer core dump > > Sent by: users-boun...@lists.opensips.org > > > > ------------------------------------------------------------------------ > > > > > > > > Hi Kennard, > > > > Ok, keep the core next time :) > > > > Regards, > > Bogdan > > > > kennard_wh...@logitech.com wrote: > > > > > > Hi Bogden, > > > > > > Thanks for explaining the child processes involved -- I misunderstood > > > what was happening. > > > > > > Unfortunately, I don't have the core anymore. My recollection is that > > > I couldn't print anything useful due to compiler optimization. That > > > said, this should re-create pretty easily, and I'll get more dumps > > > next time it happens. > > > > > > Regards, > > > Kennard > > > > > > Inactive hide details for Bogdan-Andrei Iancu ---10/05/2010 01:41:38 > > > AM---Hi Kennard, The core was generated by process 22255:Bogdan-Andrei > > > Iancu ---10/05/2010 01:41:38 AM---Hi Kennard, The core was generated > > > by process 22255: > > > > > > From: Bogdan-Andrei Iancu <bog...@voice-system.ro> > > > To: OpenSIPS users mailling list <users@lists.opensips.org> > > > Date: 10/05/2010 01:41 AM > > > Subject: Re: [OpenSIPS-Users] opensips tm timer core dump > > > Sent by: users-boun...@lists.opensips.org > > > > > > > ------------------------------------------------------------------------ > > > > > > > > > > > > Hi Kennard, > > > > > > The core was generated by process 22255: > > > [22238]: INFO:core:handle_sigs: child process 22255 exited by a > > > signal 11 > > > > > > and this process also reported mem problems: > > > [22255]: ERROR:tm:new_t: out of mem > > > > > > Can you print the "tl" or "ptr" variables in frame 0? > > > > > > Regards, > > > Bogdan > > > > > > kennard_wh...@logitech.com wrote: > > > > > > > > Running against opensips HEAD, I got a segfault in the tm timer > code. > > > > I believe this is triggered by running out of shared memory. > > > > > > > > > > > > The stack trace: > > > > > > > > (gdb) where > > > > #0 0x00007fe8f8d96212 in insert_timer_unsafe (new_tl=0x7fe8f66337b0, > > > > list_id=WT_TIMER_LIST, ext_timeout=<value optimized out>) at > > timer.c:731 > > > > #1 set_1timer (new_tl=0x7fe8f66337b0, list_id=WT_TIMER_LIST, > > > > ext_timeout=<value optimized out>) at timer.c:904 > > > > #2 0x00007fe8f8d78ac8 in t_release_transaction > (trans=0x7fe8f6633730) > > > > at t_funcs.c:122 > > > > #3 0x00007fe8f8d808e5 in t_unref (p_msg=<value optimized out>) > > > > at t_lookup.c:1152 > > > > #4 0x0000000000483ae5 in exec_post_req_cb () > > > > #5 0x000000000046c1e4 in receive_msg () > > > > #6 0x00000000004bc77c in udp_rcv_loop () > > > > #7 0x000000000042de9c in main () > > > > > > > > The offending code (I believe): > > > > if (tl->time_out==ptr->time_out) { > > > > tl->ld_tl = ptr->ld_tl > > > > ptr->ld_tl = 0; > > > > tl->ld_tl->ld_tl = tl; <-- SEG FAULT HERE (according to trace) > > > > } else { > > > > tl->ld_tl = tl; > > > > } > > > > > > > > Unfortunately, due to optimization I cannot dump anything > useful, and > > > > I'm not convinced the actual fault is on the line indicated. > Note that > > > > the core dump is not one of the processes that reported out of > memory. > > > > Maybe one of the other processes left the timer list in a corrupt > > state? > > > > > > > > The log file: > > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22255]: > > > > ERROR:tm:sip_msg_cloner: no more share memory > > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22255]: > > > > ERROR:tm:new_t: out of mem > > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22255]: > > > > ERROR:tm:t_newtran: new_t failed > > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]: > > > > WARNING:core:fm_malloc: Not enough free memory, will atempt > > > defragmenation > > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]: > > > > ERROR:tm:sip_msg_cloner: no more share memory > > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]: > > > > ERROR:tm:new_t: out of mem > > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]: > > > > ERROR:tm:t_newtran: new_t failed > > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22238]: > > > > INFO:core:handle_sigs: child process 22255 exited by a signal 11 > > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22238]: > > > > INFO:core:handle_sigs: core was generated > > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22238]: > > > > INFO:core:handle_sigs: terminating due to SIGCHLD > > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22256]: > > > > INFO:core:sig_usr: signal 15 received > > > > > > > > > > ------------------------------------------------------------------------ > > > > > > > > _______________________________________________ > > > > Users mailing list > > > > Users@lists.opensips.org > > > > http://lists.opensips.org/cgi-bin/mailman/listinfo/users > > > > > > > > > > > > > -- > > > Bogdan-Andrei Iancu > > > OpenSIPS Bootcamp > > > 15 - 19 November 2010, Edison, New Jersey, USA > > > www.voice-system.ro > > > > > > > > > _______________________________________________ > > > Users mailing list > > > Users@lists.opensips.org > > > http://lists.opensips.org/cgi-bin/mailman/listinfo/users > > > > > > > ------------------------------------------------------------------------ > > > > > > _______________________________________________ > > > Users mailing list > > > Users@lists.opensips.org > > > http://lists.opensips.org/cgi-bin/mailman/listinfo/users > > > > > > > > > -- > > Bogdan-Andrei Iancu > > OpenSIPS Bootcamp > > 15 - 19 November 2010, Edison, New Jersey, USA > > www.voice-system.ro > > > > > > _______________________________________________ > > Users mailing list > > Users@lists.opensips.org > > http://lists.opensips.org/cgi-bin/mailman/listinfo/users > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Users mailing list > > Users@lists.opensips.org > > http://lists.opensips.org/cgi-bin/mailman/listinfo/users > > > > > -- > Bogdan-Andrei Iancu > OpenSIPS Bootcamp > 15 - 19 November 2010, Edison, New Jersey, USA > www.voice-system.ro > > > _______________________________________________ > Users mailing list > Users@lists.opensips.org > http://lists.opensips.org/cgi-bin/mailman/listinfo/users > > ------------------------------------------------------------------------ > > _______________________________________________ > Users mailing list > Users@lists.opensips.org > http://lists.opensips.org/cgi-bin/mailman/listinfo/users > -- Bogdan-Andrei Iancu OpenSIPS Bootcamp 15 - 19 November 2010, Edison, New Jersey, USA www.voice-system.ro _______________________________________________ Users mailing list Users@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/users