Thank you, I've already changed the value, but that didn't solve my issue
with the blocking Kamailio.

I did some tests with an older Yocto image which contains Kamailio 5.5.4
and OpenSSL 1.1.1w. With this combination I was not able to reproduce the
deadlock.

Am Fr., 21. März 2025 um 16:49 Uhr schrieb Harald Kapper <[email protected]>:

> Hi
>
> fyi – in case you need this:
>
> systemctl edit kamailio.service
>
>
>
> add:
>
> ### Editing /etc/systemd/system/kamailio.service.d/override.conf
>
> ### Anything between here and the comment below will become the contents
> of the drop-in file
>
>
>
> [Service]
>
> LimitNOFILE=40960
>
>
>
> ### Edits below this comment will be discarded
>
>
>
> (or whatever number you’d like)
>
> saving saves the override for systemd into (like)
> /etc/systemd/system/kamailio.service.d/override.conf
>
>
>
> Afterwards you have to restart the service (systemctl restart
> kamailio.service)
>
>
>
> And you should be fine.
>
>
>
> Regards, hk
>
>
>
> *Von:* Henning Westerholt via sr-users <[email protected]>
> *Gesendet:* Donnerstag, 20. März 2025 14:25
> *An:* Kamailio (SER) - Users Mailing List <[email protected]>
> *Cc:* Mathias Schneuwly <[email protected]>; Henning Westerholt <
> [email protected]>
> *Betreff:* [SR-Users] Re: send_fd(): sendmsg failed sending 55 on 52: Too
> many references: cannot splice (109)
>
>
>
> Hello Mathias,
>
>
>
> yes, you can set the open files e.g. in the systemd unit file for Kamailio.
>
>
>
> Cheers,
>
>
>
> Henning
>
>
>
> *From:* Mathias Schneuwly via sr-users <[email protected]>
> *Sent:* Donnerstag, 20. März 2025 13:43
> *To:* [email protected]
> *Cc:* Mathias Schneuwly <[email protected]>
> *Subject:* [SR-Users] Re: send_fd(): sendmsg failed sending 55 on 52: Too
> many references: cannot splice (109)
>
>
>
> > I'll also give a try with 6.0.1 later.
>
> Meanwhile, I've tested 5.8.6 and 6.0.1 and both show the same issue.
>
>
>
> Am Do., 20. März 2025 um 10:21 Uhr schrieb Mathias Schneuwly <
> [email protected]>:
>
> Hi Henning
>
>
>
> > Maybe also check the actual limits by “cat /proc/$kamailio-pid/limits”
> Not 100% if its related, but if you are looking to scale this would be for
> sure required.
>
> The limit was
>
> Max open files            1024                 524288               files
>
> and I changed it now to
>
> Max open files            65536                65536                files
>
>
>
> Are there some recommendations for the limit? Would it make sense to
> change that in the provided systemd file if you say that 1024 is not that
> much?
>
>
>
> But anyway, it fails with 1024 and with 65536.
>
>
>
> I've also updated to 5.8.5 and it shows the same issue. I'll also give a
> try with 6.0.1 later.
>
>
>
> > About the tls_thread_mode – you should use mode 2 if you are running
> with openssl 3.x, otherwise you probably will observe random crashes.
>
> We are running OpenSSL 3.2.3 and tls_thread_mode is set to 2. But it still
> blocks.
>
>
>
> Best regards
>
> Mathias
>
>
>
> Am Do., 20. März 2025 um 08:42 Uhr schrieb Henning Westerholt <
> [email protected]>:
>
> Hello Mathias,
>
>
>
> 1024 open files are not that much, you should increase it for the kamailio
> user. Maybe also check the actual limits by “cat
> /proc/$kamailio-pid/limits”
>
> Not 100% if its related, but if you are looking to scale this would be for
> sure required.
>
>
>
> About the tls_thread_mode – you should use mode 2 if you are running with
> openssl 3.x, otherwise you probably will observe random crashes.
>
>
>
> If you are seeing the mentioned problem with 15 phones, this is clearly a
> problem and should be further investigated.
>
>
>
> Cheers,
>
>
>
> Henning
>
>
>
>
>
> *From:* Mathias Schneuwly via sr-users <[email protected]>
> *Sent:* Donnerstag, 20. März 2025 08:07
> *To:* [email protected]
> *Cc:* Mathias Schneuwly <[email protected]>
> *Subject:* [SR-Users] Re: send_fd(): sendmsg failed sending 55 on 52: Too
> many references: cannot splice (109)
>
>
>
> Hi
>
>
>
> Thanks for the reply. We are using the following limits and I guess they
> are the default values
>
> real-time non-blocking time  (microseconds, -R) unlimited
> core file size              (blocks, -c) unlimited
> data seg size               (kbytes, -d) unlimited
> scheduling priority                 (-e) 0
> file size                   (blocks, -f) unlimited
> pending signals                     (-i) 15666
> max locked memory           (kbytes, -l) 8192
> max memory size             (kbytes, -m) unlimited
> open files                          (-n) 1024
> pipe size                (512 bytes, -p) 8
> POSIX message queues         (bytes, -q) 819200
> real-time priority                  (-r) 0
> stack size                  (kbytes, -s) 8192
> cpu time                   (seconds, -t) unlimited
> max user processes                  (-u) 15666
> virtual memory              (kbytes, -v) unlimited
> file locks                          (-x) unlimited
>
>
>
> But to me, it looks more like a deadlock somewhere around SSL_accept.
> According to the stack trace generated with "kamctl trap", all 8 TCP
> receiver processes are waiting in "#1  0x00007fa14ca04830 in
> pthread_rwlock_wrlock () from /usr/lib/libc.so.6". I guess the message "Too
> many references: cannot splice (109)" is just the consequence as the main
> process is no longer able to dispatch jobs to the child workers.
>
>
>
> Meanwhile, I'm able to reproduce the issue. It requires a bit of brute
> force, but it deadlocks reliably after a few seconds.
>
>
>
> I'm flooding Kamailio with register messages using sipexer. I'm running
> the following command 3 times in parallel and always after a few seconds,
> Kamailio is completely blocked.
>
> watch -n 0.1 "./sipexer -register -vl 3 -co -com -ex 3600 -fu "2"  -au "2"
> -ap "test" -tk <path to key>.key -tc <path to cert>.cer -ti -sd -ru
> sip:<domain> -cu sip:2@<domain> tls:<fqdn>:5061;sleep 0.1;./sipexer
> -register -vl 3 -co -com -ex 0 -fu "2"  -au "2" -ap "test" -tk <path to
> key>.key -tc <path to cert>.cer -ti -sd -ru sip:<domain> -cu sip:2@<domain>
> tls:<fqdn>:5061"
>
>
>
> I played a bit with the parameters "modparam("tls", "init_mode", 1)" and
> "tls_threads_mode = 2", but it didn't seem to change anything.
>
>
>
> Please have in mind that we usually do not stress Kamailio that much. We
> have seen the issue a couple of times with around 5 to 15 SIP phones
> registered. But in this usual setup, the problem does not occur reliably.
>
>
>
> Best regards
>
> Mathias
>
>
>
> Am Di., 18. März 2025 um 21:53 Uhr schrieb Henning Westerholt <
> [email protected]>:
>
> Hello,
>
>
>
> check the ulimit settings for the kamailio user and general resource usage
> on that system. In development branch there is also a feature present to
> print the actual ulimits on startup.
>
>
>
> Cheers,
>
>
>
> Henning
>
>
>
> *From:* Mathias Schneuwly via sr-users <[email protected]>
> *Sent:* Montag, 17. März 2025 13:41
> *To:* [email protected]
> *Cc:* Mathias Schneuwly <[email protected]>
> *Subject:* [SR-Users] send_fd(): sendmsg failed sending 55 on 52: Too
> many references: cannot splice (109)
>
>
>
> Hi
>
>
>
> We are using kamailio 5.8.4.
>
>
>
> Recently, we noticed, that after a few days, we see the following messages
> from Kamailio `2025 Mar 17 10:22:11 ttel CRITICAL: <core>
> [core/pass_fd.c:193]: send_fd(): sendmsg failed sending 55 on 52: Too many
> references: cannot splice (109)`.
>
>
>
> When this happens, devices are no longer able to register. At the moment
> we have no idea what might be wrong.
>
>
>
> Unfortunately, the internet does not reveal much information regarding
> this issue. In one issue related to Kamailio (
> https://lists.kamailio.org/mailman3/hyperkitty/list/[email protected]/thread/FKN25CGV27KA3GTS3ED3E7V3WKRZALRP/),
> I found the command `kamctl trap` and when I execute that, I see one issue,
> which might be related, but I'm just guessing...
>
>
>
> ```
>
> ---start 1782236 -----------------------------------------------------
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/usr/lib/libthread_db.so.1".
> 0x00007fa14c9fb1c6 in ?? () from /usr/lib/libc.so.6
> $1 = 26
> $2 = {pid = 1782236, unix_sock = 36, idx = 0, status = 1, rank = 17, desc
> = "tcp receiver (generic) child=0", '\000' <repeats 97 times>}
> #0  0x00007fa14c9fb1c6 in ?? () from /usr/lib/libc.so.6
> No symbol table info available.
> #1  0x00007fa14ca04830 in pthread_rwlock_wrlock () from /usr/lib/libc.so.6
> No symbol table info available.
> #2  0x00007fa14ac20b49 in CRYPTO_THREAD_write_lock () from
> /usr/lib/libcrypto.so.3
> No symbol table info available.
> #3  0x00007fa14adbd5da in ?? () from /usr/lib/libcrypto.so.3
> No symbol table info available.
> #4  0x00007fa14adbd9ca in ?? () from /usr/lib/libcrypto.so.3
> No symbol table info available.
> #5  0x00007fa14adbc907 in ?? () from /usr/lib/libcrypto.so.3
> No symbol table info available.
> #6  0x00007fa14adbd26c in ?? () from /usr/lib/libcrypto.so.3
> No symbol table info available.
> #7  0x00007fa14adbd68e in ?? () from /usr/lib/libcrypto.so.3
> No symbol table info available.
> #8  0x00007fa14abe75c6 in ?? () from /usr/lib/libcrypto.so.3
> No symbol table info available.
> #9  0x00007fa14abe840b in EVP_RAND_generate () from /usr/lib/libcrypto.so.3
> No symbol table info available.
> #10 0x00007fa14acf9282 in RAND_bytes_ex () from /usr/lib/libcrypto.so.3
> No symbol table info available.
> #11 0x00007fa14b06cce8 in ?? () from /usr/lib/libssl.so.3
> No symbol table info available.
> #12 0x00007fa14b05bdf7 in ?? () from /usr/lib/libssl.so.3
> No symbol table info available.
> #13 0x00007fa14b0ef036 in tls_accept (c=c@entry=0x7fa1475c0dc0,
> error=error@entry=0x7ffebd9443b8) at tls_server.c:457
>         ret = <optimized out>
>         ssl = warning: could not convert 'ssl_st' from the host encoding
> (ANSI_X3.4-1968) to UTF-32.
> This normally should not happen, please file a bug report.
> 0x7fa147668bd0
>         cert = <optimized out>
>         tls_c = 0x7fa14768f490
>         tls_log = <optimized out>
>         __func__ = "tls_accept"
> #14 0x00007fa14b0f7540 in tls_h_read_f (c=0x7fa1475c0dc0,
> flags=0x7ffebd9647d0) at tls_server.c:1153
>         r = 0x7fa1475c0ee8
>         bytes_free = 16383
>         bytes_read = <optimized out>
>         read_size = <optimized out>
>         ssl_error = 0
>         ssl_read = 0
>         ssl = 0x7fa147668bd0
>         rd_buf =
> "\026\003\003\004\016\v\000\004\n\000\004\a\000\004\0040\202\004\0000\202\002\350\240\003\002\001\002\002\002\026\0270\r\006\t*\206H\206\367\r\001\001\v\005\0000b1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003U\004\b\f\004Bern1\r0\v\006\003U\004\a\f\004Bern1\r0\v\006\003U\004\n\f\004RUAG1\0200\016\006\003U\004\v\f\aDefence1\0240\022\006\003U\004\003\f\versaimfs.ch0\036\027\r230302163519Z\027\r291230163519Z0`1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003"...
>         wr_buf =
> "\026\003\003\000A\002\000\000=\003\003\371\204\326\361\262-9\300\344gV\2316\373\235D\314\225u8\v}\215\254DOWNGRD\001\000\3000\000\000\025\377\001\000\001\000\000\v\000\004\003\000\001\002\000#\000\000\000\027\000\000\026\003\003\a\267\v\000\a\263\000\a\260\000\004\0040\202\004\0000\202\002\350\240\003\002\001\002\002\002\r>0\r\006\t*\206H\206\367\r\001\001\v\005\0000b1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003U\004\b\f\004Bern1\r0\v\006\003U\004\a\f\004Bern1\r0\v\006\003U\004\n\f\004RUAG1\0200\016\006\003U\004\v\f\aDefence1\0240\022\006"...
>         rd = {buf = 0x7ffebd944480 "\026\003\003\004\016\v", pos = 1405,
> used = 1405, size = 65536}
>         wr = {buf = 0x7ffebd954480 "\026\003\003", pos = 0, used = 0, size
> = 65536}
>         tls_c = 0x7fa14768f490
>         enc_rd_buf = <optimized out>
>         n = 0
>         flush_flags = 0
>         err_src = <optimized out>
>         ip_buf = '\000' <repeats 63 times>
>         x = <optimized out>
>         tls_dbg = <optimized out>
>         __func__ = "tls_h_read_f"
>         redo_read = <optimized out>
>         continue_ssl_read = <optimized out>
>         __llevel = <optimized out>
>         __kld = <optimized out>
>         __llevel = <optimized out>
>         __kld = <optimized out>
> #15 0x000055d2519871b4 in tcp_read_headers (c=c@entry=0x7fa1475c0dc0,
> read_flags=read_flags@entry=0x7ffebd9647d0) at core/tcp_read.c:445
>         bytes = <optimized out>
>         remaining = <optimized out>
>         p = <optimized out>
>         r = 0x7fa1475c0ee8
>         mc = <optimized out>
>         body_len = <optimized out>
>         tvnow = {tv_sec = 1742100857, tv_usec = 944923}
>         tvdiff = <optimized out>
>         mfline = <optimized out>
>         mtransid = <optimized out>
>         __func__ = "tcp_read_headers"
> #16 0x000055d251989cef in tcp_read_req (con=0x7fa1475c0dc0,
> bytes_read=bytes_read@entry=0x7ffebd9647c8, 
> read_flags=read_flags@entry=0x7ffebd9647d0)
> at core/tcp_read.c:1509
>         bytes = <optimized out>
>         total_bytes = 0
>         resp = 1
>         size = <optimized out>
>         req = 0x7fa1475c0ee8
>         dst = {send_sock = 0x55d251a91020, to = {s = {sa_family = 52024,
> sa_data = "\257Q\322U\000\000\002\000\000\000\376\177\000"}, sin =
> {sin_family = 52024, sin_port = 20911, sin_addr = {s_addr = 21970},
> sin_zero = "\002\000\000\000\376\177\000"}, sin6 = {sin6_family = 52024,
> sin6_port = 20911, sin6_flowinfo = 21970, sin6_addr = {__in6_u =
> {__u6_addr8 =
> "\002\000\000\000\376\177\000\000\030\000\000\000\000\000\000", __u6_addr16
> = {2, 0, 32766, 0, 24, 0, 0, 0}, __u6_addr32 = {2, 32766, 24, 0}}},
> sin6_scope_id = 0}, sas = {ss_family = 52024, __ss_padding =
> "\257Q\322U\000\000\002\000\000\000\376\177\000\000\030", '\000' <repeats
> 23 times>,
> "\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\262+\226Q\001
> \000\000\200,5L\241\177\000\000 \020\251Q\322U\000\000
> \020\251Q\322U\000\000\200\303\257Q\322U\000\00009SG\241\177\000\000\330G\226\275\376\177\000\000\b\000\000\000\000\000\000",
> __ss_align = 20}}, id = 1, send_flags = {f = 1, blst_imask = 0}, proto = 14
> '\016', proto_pad0 = 0 '\000', proto_pad1 = 0}
>         c = <optimized out>
>         ret = <optimized out>
>         again = <optimized out>
>         __func__ = "tcp_read_req"
> #17 0x000055d25198ebec in handle_io (fm=fm@entry=0x7fa14c352c80,
> events=events@entry=1, idx=idx@entry=-1) at core/tcp_read.c:1906
>         ret = 8
>         n = 0
>         read_flags = RD_CONN_SHORT_READ
>         con = 0x7fa1475c0dc0
>         s = 14
>         resp = <optimized out>
>         t = <optimized out>
>         ee = 0x0
>         __func__ = "handle_io"
>         error = <optimized out>
> #18 0x000055d25199469b in io_wait_loop_epoll (repeat=repeat@entry=0, t=2,
> h=0x55d251c49d60 <io_w>) at core/io_wait.h:1075
>         n = 1
>         r = 0
>         fm = 0x7fa14c352c80
>         revents = 1
>         __func__ = "io_wait_loop_epoll"
> #19 0x000055d251994e87 in tcp_receive_loop (unix_sock=<optimized out>) at
> core/tcp_read.c:2026
>         __func__ = "tcp_receive_loop"
> #20 0x000055d25197e72c in tcp_init_children 
> (woneinit=woneinit@entry=0x7ffebd964d1c)
> at core/tcp_main.c:5391
>         r = <optimized out>
>         i = <optimized out>
>         reader_fd_1 = 39
>         pid = <optimized out>
>         si_desc = "tcp receiver
> (generic)\000\000`M\226\275\376\177\000\000\001\000\000\000\000\000\000\000\210\035\223Q\322U\000\000
> \020\251Q\322U\000\000\220\315\257Q\322U\000\000\220\242/L\241\177\000\000\025\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000%\020\251Q\322U\000\000HU\257Q\322U\000\000e\005\000\000\000\000\000\000
> \020\251Q\322U\000\000\3750\235Q\322U\000"
>         si = <optimized out>
>         __func__ = "tcp_init_children"
>         error = <optimized out>
> #21 0x000055d2517c6c7e in main_loop () at main.c:1950
>         i = <optimized out>
>         pid = <optimized out>
>         si = <optimized out>
>         si_desc = "udp receiver child=7 
> sock=10.0.90.1:5060\000\000\000\000\322U\000\000?\342\017K\241\177\000\000\022\000\000\000\377\377\377\377\200\006\265L\241\177\000\000\253m\237L\241\177\000\000\361|\003\000\000\000\000\000\t\000\000\000\000\000\000\000Sat
> Mar 15 05:25:32 2025\n\000\264L\241\177\000"
>         nrprocs = <optimized out>
>         woneinit = 1
>         __func__ = "main_loop"
>         error = <optimized out>
> #22 0x000055d2517bc151 in main (argc=<optimized out>, argv=<optimized
> out>) at main.c:3256
>         cfg_stream = <optimized out>
>         c = <optimized out>
>         r = <optimized out>
>         tmp = 0x7ffebd966e7f ""
>         tmp_len = 32673
>         port = 5060
>         proto = 0
>         aproto = 0
>         ahost = 0x0
>         aport = 0
>         options = 0x55d251aa74b8
> ":f:cm:M:dVIhEeb:B:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:"
>         ret = -1
>         seed = 3209119183
>         rfd = <optimized out>
>         debug_save = <optimized out>
>         debug_flag = <optimized out>
>         dont_fork_cnt = 0
>         n_lst = <optimized out>
>         p = <optimized out>
>         st = {st_dev = 24, st_ino = 46237, st_nlink = 2, st_mode = 16888,
> st_uid = 899, st_gid = 899, __pad0 = 0, st_rdev = 0, st_size = 40,
> st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1742012732, tv_nsec =
> 217328715}, st_mtim = {tv_sec = 1742012732, tv_nsec = 217328715}, st_ctim =
> {tv_sec = 1742012732, tv_nsec = 217328715}, __glibc_reserved = {0, 0, 0}}
>         l1 = <optimized out>
>         tbuf = '\000' <repeats 32 times>,
> "\030\202\237\275\376\177\000\000@\002\000\000@\003\000\000\001", '\000'
> <repeats 23 times>, "@\003\000\000@\003\000\000@\003\000\000@\003\000\000@
> \003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@
> \003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@
> \003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@
> \003\000\000@\003\000\000@\003", '\000' <repeats 11 times>, "\001",
> '\000' <repeats 118 times>...
>         option_index = 12
>         long_options = {{name = 0x55d251a9230b "help", has_arg = 0, flag =
> 0x0, val = 104}, {name = 0x55d251a937a4 "version", has_arg = 0, flag = 0x0,
> val = 118}, {name = 0x55d251a9826a "alias", has_arg = 1, flag = 0x0, val =
> 1024}, {name = 0x55d251a92310 "subst", has_arg = 1, flag = 0x0, val =
> 1025}, {name = 0x55d251a92316 "substdef", has_arg = 1, flag = 0x0, val =
> 1026}, {name = 0x55d251a9231f "substdefs", has_arg = 1, flag = 0x0, val =
> 1027}, {name = 0x55d251a92329 "server-id", has_arg = 1, flag = 0x0, val =
> 1028}, {name = 0x55d251a92333 "loadmodule", has_arg = 1, flag = 0x0, val =
> 1029}, {name = 0x55d251a9233e "modparam", has_arg = 1, flag = 0x0, val =
> 1030}, {name = 0x55d251a92347 "log-engine", has_arg = 1, flag = 0x0, val =
> 1031}, {name = 0x55d251a938c1 "debug", has_arg = 1, flag = 0x0, val =
> 1032}, {name = 0x55d251a92352 "cfg-print", has_arg = 0, flag = 0x0, val =
> 1033}, {name = 0x55d251a9235c "atexit", has_arg = 1, flag = 0x0, val =
> 1034}, {name = 0x55d251a92363 "all-errors", has_arg = 0, flag = 0x0, val =
> 1035}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}}
>         __func__ = "main"
> [Inferior 1 (process 1782236) detached]
> ---end 1782236 -------------------------------------------------------
>
> ```
>
>
>
> Does anybody have an idea what this could be? I never saw that with
> previous versions, and now we can see it on several instances, but we are
> yet unable to reproduce.
>
>
>
> Not sure if this might be related, but we also see many messages like `Mar
> 17 13:39:03 ttel /usr/sbin/kamailio[2882127]: CRITICAL: <core>
> [core/tcp_main.c:5558]: tcp_timer_check_connections(): message processing
> timeout on connection id: 402 (state: 0) - closing`
>
>
>
> Best regards
>
> Mathias
>
>
__________________________________________________________
Kamailio - Users Mailing List - Non Commercial Discussions -- 
[email protected]
To unsubscribe send an email to [email protected]
Important: keep the mailing list in the recipients, do not reply only to the 
sender!

Reply via email to