Thank you, I've already changed the value, but that didn't solve my issue with the blocking Kamailio.
I did some tests with an older Yocto image which contains Kamailio 5.5.4 and OpenSSL 1.1.1w. With this combination I was not able to reproduce the deadlock. Am Fr., 21. März 2025 um 16:49 Uhr schrieb Harald Kapper <[email protected]>: > Hi > > fyi – in case you need this: > > systemctl edit kamailio.service > > > > add: > > ### Editing /etc/systemd/system/kamailio.service.d/override.conf > > ### Anything between here and the comment below will become the contents > of the drop-in file > > > > [Service] > > LimitNOFILE=40960 > > > > ### Edits below this comment will be discarded > > > > (or whatever number you’d like) > > saving saves the override for systemd into (like) > /etc/systemd/system/kamailio.service.d/override.conf > > > > Afterwards you have to restart the service (systemctl restart > kamailio.service) > > > > And you should be fine. > > > > Regards, hk > > > > *Von:* Henning Westerholt via sr-users <[email protected]> > *Gesendet:* Donnerstag, 20. März 2025 14:25 > *An:* Kamailio (SER) - Users Mailing List <[email protected]> > *Cc:* Mathias Schneuwly <[email protected]>; Henning Westerholt < > [email protected]> > *Betreff:* [SR-Users] Re: send_fd(): sendmsg failed sending 55 on 52: Too > many references: cannot splice (109) > > > > Hello Mathias, > > > > yes, you can set the open files e.g. in the systemd unit file for Kamailio. > > > > Cheers, > > > > Henning > > > > *From:* Mathias Schneuwly via sr-users <[email protected]> > *Sent:* Donnerstag, 20. März 2025 13:43 > *To:* [email protected] > *Cc:* Mathias Schneuwly <[email protected]> > *Subject:* [SR-Users] Re: send_fd(): sendmsg failed sending 55 on 52: Too > many references: cannot splice (109) > > > > > I'll also give a try with 6.0.1 later. > > Meanwhile, I've tested 5.8.6 and 6.0.1 and both show the same issue. > > > > Am Do., 20. März 2025 um 10:21 Uhr schrieb Mathias Schneuwly < > [email protected]>: > > Hi Henning > > > > > Maybe also check the actual limits by “cat /proc/$kamailio-pid/limits” > Not 100% if its related, but if you are looking to scale this would be for > sure required. > > The limit was > > Max open files 1024 524288 files > > and I changed it now to > > Max open files 65536 65536 files > > > > Are there some recommendations for the limit? Would it make sense to > change that in the provided systemd file if you say that 1024 is not that > much? > > > > But anyway, it fails with 1024 and with 65536. > > > > I've also updated to 5.8.5 and it shows the same issue. I'll also give a > try with 6.0.1 later. > > > > > About the tls_thread_mode – you should use mode 2 if you are running > with openssl 3.x, otherwise you probably will observe random crashes. > > We are running OpenSSL 3.2.3 and tls_thread_mode is set to 2. But it still > blocks. > > > > Best regards > > Mathias > > > > Am Do., 20. März 2025 um 08:42 Uhr schrieb Henning Westerholt < > [email protected]>: > > Hello Mathias, > > > > 1024 open files are not that much, you should increase it for the kamailio > user. Maybe also check the actual limits by “cat > /proc/$kamailio-pid/limits” > > Not 100% if its related, but if you are looking to scale this would be for > sure required. > > > > About the tls_thread_mode – you should use mode 2 if you are running with > openssl 3.x, otherwise you probably will observe random crashes. > > > > If you are seeing the mentioned problem with 15 phones, this is clearly a > problem and should be further investigated. > > > > Cheers, > > > > Henning > > > > > > *From:* Mathias Schneuwly via sr-users <[email protected]> > *Sent:* Donnerstag, 20. März 2025 08:07 > *To:* [email protected] > *Cc:* Mathias Schneuwly <[email protected]> > *Subject:* [SR-Users] Re: send_fd(): sendmsg failed sending 55 on 52: Too > many references: cannot splice (109) > > > > Hi > > > > Thanks for the reply. We are using the following limits and I guess they > are the default values > > real-time non-blocking time (microseconds, -R) unlimited > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 15666 > max locked memory (kbytes, -l) 8192 > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 8192 > cpu time (seconds, -t) unlimited > max user processes (-u) 15666 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > > > > But to me, it looks more like a deadlock somewhere around SSL_accept. > According to the stack trace generated with "kamctl trap", all 8 TCP > receiver processes are waiting in "#1 0x00007fa14ca04830 in > pthread_rwlock_wrlock () from /usr/lib/libc.so.6". I guess the message "Too > many references: cannot splice (109)" is just the consequence as the main > process is no longer able to dispatch jobs to the child workers. > > > > Meanwhile, I'm able to reproduce the issue. It requires a bit of brute > force, but it deadlocks reliably after a few seconds. > > > > I'm flooding Kamailio with register messages using sipexer. I'm running > the following command 3 times in parallel and always after a few seconds, > Kamailio is completely blocked. > > watch -n 0.1 "./sipexer -register -vl 3 -co -com -ex 3600 -fu "2" -au "2" > -ap "test" -tk <path to key>.key -tc <path to cert>.cer -ti -sd -ru > sip:<domain> -cu sip:2@<domain> tls:<fqdn>:5061;sleep 0.1;./sipexer > -register -vl 3 -co -com -ex 0 -fu "2" -au "2" -ap "test" -tk <path to > key>.key -tc <path to cert>.cer -ti -sd -ru sip:<domain> -cu sip:2@<domain> > tls:<fqdn>:5061" > > > > I played a bit with the parameters "modparam("tls", "init_mode", 1)" and > "tls_threads_mode = 2", but it didn't seem to change anything. > > > > Please have in mind that we usually do not stress Kamailio that much. We > have seen the issue a couple of times with around 5 to 15 SIP phones > registered. But in this usual setup, the problem does not occur reliably. > > > > Best regards > > Mathias > > > > Am Di., 18. März 2025 um 21:53 Uhr schrieb Henning Westerholt < > [email protected]>: > > Hello, > > > > check the ulimit settings for the kamailio user and general resource usage > on that system. In development branch there is also a feature present to > print the actual ulimits on startup. > > > > Cheers, > > > > Henning > > > > *From:* Mathias Schneuwly via sr-users <[email protected]> > *Sent:* Montag, 17. März 2025 13:41 > *To:* [email protected] > *Cc:* Mathias Schneuwly <[email protected]> > *Subject:* [SR-Users] send_fd(): sendmsg failed sending 55 on 52: Too > many references: cannot splice (109) > > > > Hi > > > > We are using kamailio 5.8.4. > > > > Recently, we noticed, that after a few days, we see the following messages > from Kamailio `2025 Mar 17 10:22:11 ttel CRITICAL: <core> > [core/pass_fd.c:193]: send_fd(): sendmsg failed sending 55 on 52: Too many > references: cannot splice (109)`. > > > > When this happens, devices are no longer able to register. At the moment > we have no idea what might be wrong. > > > > Unfortunately, the internet does not reveal much information regarding > this issue. In one issue related to Kamailio ( > https://lists.kamailio.org/mailman3/hyperkitty/list/[email protected]/thread/FKN25CGV27KA3GTS3ED3E7V3WKRZALRP/), > I found the command `kamctl trap` and when I execute that, I see one issue, > which might be related, but I'm just guessing... > > > > ``` > > ---start 1782236 ----------------------------------------------------- > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/usr/lib/libthread_db.so.1". > 0x00007fa14c9fb1c6 in ?? () from /usr/lib/libc.so.6 > $1 = 26 > $2 = {pid = 1782236, unix_sock = 36, idx = 0, status = 1, rank = 17, desc > = "tcp receiver (generic) child=0", '\000' <repeats 97 times>} > #0 0x00007fa14c9fb1c6 in ?? () from /usr/lib/libc.so.6 > No symbol table info available. > #1 0x00007fa14ca04830 in pthread_rwlock_wrlock () from /usr/lib/libc.so.6 > No symbol table info available. > #2 0x00007fa14ac20b49 in CRYPTO_THREAD_write_lock () from > /usr/lib/libcrypto.so.3 > No symbol table info available. > #3 0x00007fa14adbd5da in ?? () from /usr/lib/libcrypto.so.3 > No symbol table info available. > #4 0x00007fa14adbd9ca in ?? () from /usr/lib/libcrypto.so.3 > No symbol table info available. > #5 0x00007fa14adbc907 in ?? () from /usr/lib/libcrypto.so.3 > No symbol table info available. > #6 0x00007fa14adbd26c in ?? () from /usr/lib/libcrypto.so.3 > No symbol table info available. > #7 0x00007fa14adbd68e in ?? () from /usr/lib/libcrypto.so.3 > No symbol table info available. > #8 0x00007fa14abe75c6 in ?? () from /usr/lib/libcrypto.so.3 > No symbol table info available. > #9 0x00007fa14abe840b in EVP_RAND_generate () from /usr/lib/libcrypto.so.3 > No symbol table info available. > #10 0x00007fa14acf9282 in RAND_bytes_ex () from /usr/lib/libcrypto.so.3 > No symbol table info available. > #11 0x00007fa14b06cce8 in ?? () from /usr/lib/libssl.so.3 > No symbol table info available. > #12 0x00007fa14b05bdf7 in ?? () from /usr/lib/libssl.so.3 > No symbol table info available. > #13 0x00007fa14b0ef036 in tls_accept (c=c@entry=0x7fa1475c0dc0, > error=error@entry=0x7ffebd9443b8) at tls_server.c:457 > ret = <optimized out> > ssl = warning: could not convert 'ssl_st' from the host encoding > (ANSI_X3.4-1968) to UTF-32. > This normally should not happen, please file a bug report. > 0x7fa147668bd0 > cert = <optimized out> > tls_c = 0x7fa14768f490 > tls_log = <optimized out> > __func__ = "tls_accept" > #14 0x00007fa14b0f7540 in tls_h_read_f (c=0x7fa1475c0dc0, > flags=0x7ffebd9647d0) at tls_server.c:1153 > r = 0x7fa1475c0ee8 > bytes_free = 16383 > bytes_read = <optimized out> > read_size = <optimized out> > ssl_error = 0 > ssl_read = 0 > ssl = 0x7fa147668bd0 > rd_buf = > "\026\003\003\004\016\v\000\004\n\000\004\a\000\004\0040\202\004\0000\202\002\350\240\003\002\001\002\002\002\026\0270\r\006\t*\206H\206\367\r\001\001\v\005\0000b1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003U\004\b\f\004Bern1\r0\v\006\003U\004\a\f\004Bern1\r0\v\006\003U\004\n\f\004RUAG1\0200\016\006\003U\004\v\f\aDefence1\0240\022\006\003U\004\003\f\versaimfs.ch0\036\027\r230302163519Z\027\r291230163519Z0`1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003"... > wr_buf = > "\026\003\003\000A\002\000\000=\003\003\371\204\326\361\262-9\300\344gV\2316\373\235D\314\225u8\v}\215\254DOWNGRD\001\000\3000\000\000\025\377\001\000\001\000\000\v\000\004\003\000\001\002\000#\000\000\000\027\000\000\026\003\003\a\267\v\000\a\263\000\a\260\000\004\0040\202\004\0000\202\002\350\240\003\002\001\002\002\002\r>0\r\006\t*\206H\206\367\r\001\001\v\005\0000b1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003U\004\b\f\004Bern1\r0\v\006\003U\004\a\f\004Bern1\r0\v\006\003U\004\n\f\004RUAG1\0200\016\006\003U\004\v\f\aDefence1\0240\022\006"... > rd = {buf = 0x7ffebd944480 "\026\003\003\004\016\v", pos = 1405, > used = 1405, size = 65536} > wr = {buf = 0x7ffebd954480 "\026\003\003", pos = 0, used = 0, size > = 65536} > tls_c = 0x7fa14768f490 > enc_rd_buf = <optimized out> > n = 0 > flush_flags = 0 > err_src = <optimized out> > ip_buf = '\000' <repeats 63 times> > x = <optimized out> > tls_dbg = <optimized out> > __func__ = "tls_h_read_f" > redo_read = <optimized out> > continue_ssl_read = <optimized out> > __llevel = <optimized out> > __kld = <optimized out> > __llevel = <optimized out> > __kld = <optimized out> > #15 0x000055d2519871b4 in tcp_read_headers (c=c@entry=0x7fa1475c0dc0, > read_flags=read_flags@entry=0x7ffebd9647d0) at core/tcp_read.c:445 > bytes = <optimized out> > remaining = <optimized out> > p = <optimized out> > r = 0x7fa1475c0ee8 > mc = <optimized out> > body_len = <optimized out> > tvnow = {tv_sec = 1742100857, tv_usec = 944923} > tvdiff = <optimized out> > mfline = <optimized out> > mtransid = <optimized out> > __func__ = "tcp_read_headers" > #16 0x000055d251989cef in tcp_read_req (con=0x7fa1475c0dc0, > bytes_read=bytes_read@entry=0x7ffebd9647c8, > read_flags=read_flags@entry=0x7ffebd9647d0) > at core/tcp_read.c:1509 > bytes = <optimized out> > total_bytes = 0 > resp = 1 > size = <optimized out> > req = 0x7fa1475c0ee8 > dst = {send_sock = 0x55d251a91020, to = {s = {sa_family = 52024, > sa_data = "\257Q\322U\000\000\002\000\000\000\376\177\000"}, sin = > {sin_family = 52024, sin_port = 20911, sin_addr = {s_addr = 21970}, > sin_zero = "\002\000\000\000\376\177\000"}, sin6 = {sin6_family = 52024, > sin6_port = 20911, sin6_flowinfo = 21970, sin6_addr = {__in6_u = > {__u6_addr8 = > "\002\000\000\000\376\177\000\000\030\000\000\000\000\000\000", __u6_addr16 > = {2, 0, 32766, 0, 24, 0, 0, 0}, __u6_addr32 = {2, 32766, 24, 0}}}, > sin6_scope_id = 0}, sas = {ss_family = 52024, __ss_padding = > "\257Q\322U\000\000\002\000\000\000\376\177\000\000\030", '\000' <repeats > 23 times>, > "\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\262+\226Q\001 > \000\000\200,5L\241\177\000\000 \020\251Q\322U\000\000 > \020\251Q\322U\000\000\200\303\257Q\322U\000\00009SG\241\177\000\000\330G\226\275\376\177\000\000\b\000\000\000\000\000\000", > __ss_align = 20}}, id = 1, send_flags = {f = 1, blst_imask = 0}, proto = 14 > '\016', proto_pad0 = 0 '\000', proto_pad1 = 0} > c = <optimized out> > ret = <optimized out> > again = <optimized out> > __func__ = "tcp_read_req" > #17 0x000055d25198ebec in handle_io (fm=fm@entry=0x7fa14c352c80, > events=events@entry=1, idx=idx@entry=-1) at core/tcp_read.c:1906 > ret = 8 > n = 0 > read_flags = RD_CONN_SHORT_READ > con = 0x7fa1475c0dc0 > s = 14 > resp = <optimized out> > t = <optimized out> > ee = 0x0 > __func__ = "handle_io" > error = <optimized out> > #18 0x000055d25199469b in io_wait_loop_epoll (repeat=repeat@entry=0, t=2, > h=0x55d251c49d60 <io_w>) at core/io_wait.h:1075 > n = 1 > r = 0 > fm = 0x7fa14c352c80 > revents = 1 > __func__ = "io_wait_loop_epoll" > #19 0x000055d251994e87 in tcp_receive_loop (unix_sock=<optimized out>) at > core/tcp_read.c:2026 > __func__ = "tcp_receive_loop" > #20 0x000055d25197e72c in tcp_init_children > (woneinit=woneinit@entry=0x7ffebd964d1c) > at core/tcp_main.c:5391 > r = <optimized out> > i = <optimized out> > reader_fd_1 = 39 > pid = <optimized out> > si_desc = "tcp receiver > (generic)\000\000`M\226\275\376\177\000\000\001\000\000\000\000\000\000\000\210\035\223Q\322U\000\000 > \020\251Q\322U\000\000\220\315\257Q\322U\000\000\220\242/L\241\177\000\000\025\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000%\020\251Q\322U\000\000HU\257Q\322U\000\000e\005\000\000\000\000\000\000 > \020\251Q\322U\000\000\3750\235Q\322U\000" > si = <optimized out> > __func__ = "tcp_init_children" > error = <optimized out> > #21 0x000055d2517c6c7e in main_loop () at main.c:1950 > i = <optimized out> > pid = <optimized out> > si = <optimized out> > si_desc = "udp receiver child=7 > sock=10.0.90.1:5060\000\000\000\000\322U\000\000?\342\017K\241\177\000\000\022\000\000\000\377\377\377\377\200\006\265L\241\177\000\000\253m\237L\241\177\000\000\361|\003\000\000\000\000\000\t\000\000\000\000\000\000\000Sat > Mar 15 05:25:32 2025\n\000\264L\241\177\000" > nrprocs = <optimized out> > woneinit = 1 > __func__ = "main_loop" > error = <optimized out> > #22 0x000055d2517bc151 in main (argc=<optimized out>, argv=<optimized > out>) at main.c:3256 > cfg_stream = <optimized out> > c = <optimized out> > r = <optimized out> > tmp = 0x7ffebd966e7f "" > tmp_len = 32673 > port = 5060 > proto = 0 > aproto = 0 > ahost = 0x0 > aport = 0 > options = 0x55d251aa74b8 > ":f:cm:M:dVIhEeb:B:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:" > ret = -1 > seed = 3209119183 > rfd = <optimized out> > debug_save = <optimized out> > debug_flag = <optimized out> > dont_fork_cnt = 0 > n_lst = <optimized out> > p = <optimized out> > st = {st_dev = 24, st_ino = 46237, st_nlink = 2, st_mode = 16888, > st_uid = 899, st_gid = 899, __pad0 = 0, st_rdev = 0, st_size = 40, > st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1742012732, tv_nsec = > 217328715}, st_mtim = {tv_sec = 1742012732, tv_nsec = 217328715}, st_ctim = > {tv_sec = 1742012732, tv_nsec = 217328715}, __glibc_reserved = {0, 0, 0}} > l1 = <optimized out> > tbuf = '\000' <repeats 32 times>, > "\030\202\237\275\376\177\000\000@\002\000\000@\003\000\000\001", '\000' > <repeats 23 times>, "@\003\000\000@\003\000\000@\003\000\000@\003\000\000@ > \003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@ > \003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@ > \003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@ > \003\000\000@\003\000\000@\003", '\000' <repeats 11 times>, "\001", > '\000' <repeats 118 times>... > option_index = 12 > long_options = {{name = 0x55d251a9230b "help", has_arg = 0, flag = > 0x0, val = 104}, {name = 0x55d251a937a4 "version", has_arg = 0, flag = 0x0, > val = 118}, {name = 0x55d251a9826a "alias", has_arg = 1, flag = 0x0, val = > 1024}, {name = 0x55d251a92310 "subst", has_arg = 1, flag = 0x0, val = > 1025}, {name = 0x55d251a92316 "substdef", has_arg = 1, flag = 0x0, val = > 1026}, {name = 0x55d251a9231f "substdefs", has_arg = 1, flag = 0x0, val = > 1027}, {name = 0x55d251a92329 "server-id", has_arg = 1, flag = 0x0, val = > 1028}, {name = 0x55d251a92333 "loadmodule", has_arg = 1, flag = 0x0, val = > 1029}, {name = 0x55d251a9233e "modparam", has_arg = 1, flag = 0x0, val = > 1030}, {name = 0x55d251a92347 "log-engine", has_arg = 1, flag = 0x0, val = > 1031}, {name = 0x55d251a938c1 "debug", has_arg = 1, flag = 0x0, val = > 1032}, {name = 0x55d251a92352 "cfg-print", has_arg = 0, flag = 0x0, val = > 1033}, {name = 0x55d251a9235c "atexit", has_arg = 1, flag = 0x0, val = > 1034}, {name = 0x55d251a92363 "all-errors", has_arg = 0, flag = 0x0, val = > 1035}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} > __func__ = "main" > [Inferior 1 (process 1782236) detached] > ---end 1782236 ------------------------------------------------------- > > ``` > > > > Does anybody have an idea what this could be? I never saw that with > previous versions, and now we can see it on several instances, but we are > yet unable to reproduce. > > > > Not sure if this might be related, but we also see many messages like `Mar > 17 13:39:03 ttel /usr/sbin/kamailio[2882127]: CRITICAL: <core> > [core/tcp_main.c:5558]: tcp_timer_check_connections(): message processing > timeout on connection id: 402 (state: 0) - closing` > > > > Best regards > > Mathias > >
__________________________________________________________ Kamailio - Users Mailing List - Non Commercial Discussions -- [email protected] To unsubscribe send an email to [email protected] Important: keep the mailing list in the recipients, do not reply only to the sender!
