Fixes for SEGV in dialog module.
<!-- Kamailio Pull Request Template -->
<!--
IMPORTANT:
- for detailed contributing guidelines, read:
https://github.com/kamailio/kamailio/blob/master/.github/CONTRIBUTING.md
- pull requests must be done to master branch, unless they are backports
of fixes from master branch to a stable branch
- backports to stable branches must be done with 'git cherry-pick -x
...'
- code is contributed under BSD for core and main components (tm, sl, auth,
tls)
- code is contributed GPLv2 or a compatible license for the other components
- GPL code is contributed with OpenSSL licensing exception
-->
#### Pre-Submission Checklist
<!-- Go over all points below, and after creating the PR, tick all the
checkboxes that apply -->
<!-- All points should be verified, otherwise, read the CONTRIBUTING
guidelines from above-->
<!-- If you're unsure about any of these, don't hesitate to ask on
sr-dev mailing list -->
- [x] Commit message has the format required by CONTRIBUTING guide
- [x] Commits are split per component (core, individual modules, libs, utils,
...)
- [x] Each component has a single commit (if not, squash them into one commit)
- [x] No commits to README files for modules (changes must be done to docbook
files
in `doc/` subfolder, the README file is autogenerated)
#### Type Of Change
- [ ] Small bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds new functionality)
- [ ] Breaking change (fix or feature that would change existing functionality)
#### Checklist:
<!-- Go over all points below, and after creating the PR, tick the
checkboxes that apply -->
- [ ] PR should be backported to stable branches
- [ ] Tested changes locally
- [ ] Related to issue #XXXX (replace XXXX with an open issue number)
#### Description
Hi !
The problem - periodically, 2-3 times in а month, kamailio dumps core with SEGV.
Different OS versions (Ubuntu 18,20,22), different kamailio versions from 5.7.1
up to 5.8.4,
high call load, python kemi, actively used dialog module.
Analyzing core files shows more or less the same picture - SEGV happends in
dialog mododule
around the dialog vars, can be in rpc call (dlg.list), in destroy dlg, in
accessing dlg variable.
Backtrace can look like:
```
Core was generated by `/usr/sbin/kamailio -P /run/kamailio/kamailio.pid -f
/etc/kamailio/kamailio.cfg'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 get_dlg_variable_unsafe (dlg=dlg@entry=0x7fa2818849b8,
key=key@entry=0x7ffe67990d68) at dlg_var.c:236
236 dlg_var.c: No such file or directory.
(gdb) bt full
#0 get_dlg_variable_unsafe (dlg=dlg@entry=0x7fa2818849b8,
key=key@entry=0x7ffe67990d68) at dlg_var.c:236
var = 0x8
var_list = <optimized out>
#1 0x00007fa2a2ce0397 in get_dlg_varref (dlg=dlg@entry=0x7fa2818849b8,
key=key@entry=0x7ffe67990d68) at dlg_var.c:297
var = 0x0
__func__ = "get_dlg_varref"
#2 0x00007fa2a2ccd429 in ki_dlg_var_get_mode (msg=<optimized out>,
name=0x7ffe67990d68, rmode=0) at dialog.c:2462
dlg = 0x7fa2818849b8
pval = <optimized out>
#3 0x000055e2d5b87cbb in sr_kemi_exec_func (ket=ket@entry=0x7fa2a2d0c0c0
<sr_kemi_dialog_exports+1728>, msg=msg@entry=0x7fa2a34577c0,
pno=pno@entry=1, vps=vps@entry=0x7ffe67990d60) at core/kemiexec.c:82
ret = <optimized out>
__func__ = "sr_kemi_exec_func"
#4 0x00007fa2a2930376 in sr_apy_kemi_exec_func_ex
(ket=ket@entry=0x7fa2a2d0c0c0 <sr_kemi_dialog_exports+1728>,
self=self@entry=0x7fa280ce2b80, args=args@entry=0x7fa2804ba3a0,
idx=idx@entry=485) at apy_kemi.c:327
fname = <optimized out>
i = <optimized out>
ret = <optimized out>
vps = {{vtype = 2, v = {n = -2142122656, l = 140335914274144, s = {s =
0x7fa28051cd60 "rtpe_setid", len = 10}, dict = 0x7fa28051cd60}},
{vtype = 0, v = {n = 0, l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}, {vtype =
0, v = {
n = 0, l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}, {vtype = 0, v
= {n = 0, l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}, {vtype = 0, v = {n = 0,
l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}, {vtype = 0, v = {n = 0, l = 0,
s = {s = 0x0, len = 0}, dict = 0x0}}}
env_P = <optimized out>
lmsg = 0x7fa2a34577c0
xret = <optimized out>
slen = 10
alen = 1
pobj = <optimized out>
__func__ = "sr_apy_kemi_exec_func_ex"
#5 0x00007fa2a2931f45 in sr_apy_kemi_exec_func (self=0x7fa280ce2b80,
args=0x7fa2804ba3a0, idx=485) at apy_kemi.c:356
ket = 0x7fa2a2d0c0c0 <sr_kemi_dialog_exports+1728>
ret = 0x0
pstate = 0x0
pframe = 0x0
tvb = {tv_sec = 0, tv_usec = 0}
tve = {tv_sec = 0, tv_usec = 0}
tz = {tz_minuteswest = 2, tz_dsttime = 0}
tdiff = <optimized out>
__func__ = "sr_apy_kemi_exec_func"
#6 0x00007fa2a266f697 in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
```
Inspecting dlg vars, the chain look like this:
```
(gdb) frame 0
#0 get_dlg_variable_unsafe (dlg=dlg@entry=0x7fa2818849b8,
key=key@entry=0x7ffe67990d68) at dlg_var.c:236
236 in dlg_var.c
(gdb) p *dlg.vars
$1 = {key = {s = 0x7fa2833ea040 "mos", len = 3}, value = {s =
0x7fa2833ea0b0 "4.3", len = 3}, vflags = 1, next = 0x7fa2825fa3d0}
(gdb) p *$.next
$2 = {key = {s = 0x7fa200332e34 <error: Cannot access memory at address
0x7fa200332e34>, len = -1061109568}, value = {s = 0xabcdefed <error:
Cannot access memory at address 0xabcdefed>, len = 49}, vflags = 1, next =
0x8}
(gdb)
```
Additionaly, inspecting logs i found that this is happends in SIP race
conditions, f.e. when caller and calle sends BYE at the same time and
kamailio processing two different transactions of same dialog in two different
workers, so present some competition from identical scenarios.
Long time i can't reproduce this in lab, until insert an artificial delay
into the scenario which process BYE request to increase competition.
This code quickly causes SEGV in race conditions with BYE:
```
#kemi get/set dlg var
while time.time()-start < 0.5:
for i in ('bbb','ccc'):
dlgvar=KSR.dialog.var_get(i)
KSR.dialog.var_sets(i,str(random.randint(1,500000)))
```
SEGV not fire if i change dialog var_get/set to kamscript variant:
```
while time.time()-start < 0.5:
for i in ('bbb','ccc'):
dlgvar=KSR.pv.get(f"$dlg_var({i})")
KSR.pv.sets(f"$dlg_var({i})", str(random.randint(1,500000)))
```
By analyzing source i found some places which are not properly covered by
locks, and some dangerous places
where direct pointer to dialog var structure used, which can change in
concurrent process,
so in first case i insert the locks, in second - it is better to use the clone
of variable in private PV buffer.
Also found some mistakes with dialog flags, which can potentionaly affect DMQ
and DB dialog behavior.
You can view, comment on, or merge this pull request online at:
https://github.com/kamailio/kamailio/pull/4151
-- Commit Summary --
* dialog: covering by locks accessing/changing variables in kemi functions
-- File Changes --
M src/modules/dialog/dialog.c (36)
M src/modules/dialog/dlg_hash.h (2)
M src/modules/dialog/dlg_var.c (2)
-- Patch Links --
https://github.com/kamailio/kamailio/pull/4151.patch
https://github.com/kamailio/kamailio/pull/4151.diff
--
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/pull/4151
You are receiving this because you are subscribed to this thread.
Message ID: <kamailio/kamailio/pull/[email protected]>
_______________________________________________
Kamailio - Development Mailing List -- [email protected]
To unsubscribe send an email to [email protected]
Important: keep the mailing list in the recipients, do not reply only to the
sender!