Hi

Thanks for the reply. We are using the following limits and I guess they
are the default values
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 15666
max locked memory           (kbytes, -l) 8192
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 15666
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

But to me, it looks more like a deadlock somewhere around SSL_accept.
According to the stack trace generated with "kamctl trap", all 8 TCP
receiver processes are waiting in "#1  0x00007fa14ca04830 in
pthread_rwlock_wrlock () from /usr/lib/libc.so.6". I guess the message "Too
many references: cannot splice (109)" is just the consequence as the main
process is no longer able to dispatch jobs to the child workers.

Meanwhile, I'm able to reproduce the issue. It requires a bit of brute
force, but it deadlocks reliably after a few seconds.

I'm flooding Kamailio with register messages using sipexer. I'm running the
following command 3 times in parallel and always after a few seconds,
Kamailio is completely blocked.
watch -n 0.1 "./sipexer -register -vl 3 -co -com -ex 3600 -fu "2"  -au "2"
-ap "test" -tk <path to key>.key -tc <path to cert>.cer -ti -sd -ru
sip:<domain> -cu sip:2@<domain> tls:<fqdn>:5061;sleep 0.1;./sipexer
-register -vl 3 -co -com -ex 0 -fu "2"  -au "2" -ap "test" -tk <path to
key>.key -tc <path to cert>.cer -ti -sd -ru sip:<domain> -cu sip:2@<domain>
tls:<fqdn>:5061"

I played a bit with the parameters "modparam("tls", "init_mode", 1)" and
"tls_threads_mode = 2", but it didn't seem to change anything.

Please have in mind that we usually do not stress Kamailio that much. We
have seen the issue a couple of times with around 5 to 15 SIP phones
registered. But in this usual setup, the problem does not occur reliably.

Best regards
Mathias

Am Di., 18. März 2025 um 21:53 Uhr schrieb Henning Westerholt <[email protected]
>:

> Hello,
>
>
>
> check the ulimit settings for the kamailio user and general resource usage
> on that system. In development branch there is also a feature present to
> print the actual ulimits on startup.
>
>
>
> Cheers,
>
>
>
> Henning
>
>
>
> *From:* Mathias Schneuwly via sr-users <[email protected]>
> *Sent:* Montag, 17. März 2025 13:41
> *To:* [email protected]
> *Cc:* Mathias Schneuwly <[email protected]>
> *Subject:* [SR-Users] send_fd(): sendmsg failed sending 55 on 52: Too
> many references: cannot splice (109)
>
>
>
> Hi
>
>
>
> We are using kamailio 5.8.4.
>
>
>
> Recently, we noticed, that after a few days, we see the following messages
> from Kamailio `2025 Mar 17 10:22:11 ttel CRITICAL: <core>
> [core/pass_fd.c:193]: send_fd(): sendmsg failed sending 55 on 52: Too many
> references: cannot splice (109)`.
>
>
>
> When this happens, devices are no longer able to register. At the moment
> we have no idea what might be wrong.
>
>
>
> Unfortunately, the internet does not reveal much information regarding
> this issue. In one issue related to Kamailio (
> https://lists.kamailio.org/mailman3/hyperkitty/list/[email protected]/thread/FKN25CGV27KA3GTS3ED3E7V3WKRZALRP/),
> I found the command `kamctl trap` and when I execute that, I see one issue,
> which might be related, but I'm just guessing...
>
>
>
> ```
>
> ---start 1782236 -----------------------------------------------------
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/usr/lib/libthread_db.so.1".
> 0x00007fa14c9fb1c6 in ?? () from /usr/lib/libc.so.6
> $1 = 26
> $2 = {pid = 1782236, unix_sock = 36, idx = 0, status = 1, rank = 17, desc
> = "tcp receiver (generic) child=0", '\000' <repeats 97 times>}
> #0  0x00007fa14c9fb1c6 in ?? () from /usr/lib/libc.so.6
> No symbol table info available.
> #1  0x00007fa14ca04830 in pthread_rwlock_wrlock () from /usr/lib/libc.so.6
> No symbol table info available.
> #2  0x00007fa14ac20b49 in CRYPTO_THREAD_write_lock () from
> /usr/lib/libcrypto.so.3
> No symbol table info available.
> #3  0x00007fa14adbd5da in ?? () from /usr/lib/libcrypto.so.3
> No symbol table info available.
> #4  0x00007fa14adbd9ca in ?? () from /usr/lib/libcrypto.so.3
> No symbol table info available.
> #5  0x00007fa14adbc907 in ?? () from /usr/lib/libcrypto.so.3
> No symbol table info available.
> #6  0x00007fa14adbd26c in ?? () from /usr/lib/libcrypto.so.3
> No symbol table info available.
> #7  0x00007fa14adbd68e in ?? () from /usr/lib/libcrypto.so.3
> No symbol table info available.
> #8  0x00007fa14abe75c6 in ?? () from /usr/lib/libcrypto.so.3
> No symbol table info available.
> #9  0x00007fa14abe840b in EVP_RAND_generate () from /usr/lib/libcrypto.so.3
> No symbol table info available.
> #10 0x00007fa14acf9282 in RAND_bytes_ex () from /usr/lib/libcrypto.so.3
> No symbol table info available.
> #11 0x00007fa14b06cce8 in ?? () from /usr/lib/libssl.so.3
> No symbol table info available.
> #12 0x00007fa14b05bdf7 in ?? () from /usr/lib/libssl.so.3
> No symbol table info available.
> #13 0x00007fa14b0ef036 in tls_accept (c=c@entry=0x7fa1475c0dc0,
> error=error@entry=0x7ffebd9443b8) at tls_server.c:457
>         ret = <optimized out>
>         ssl = warning: could not convert 'ssl_st' from the host encoding
> (ANSI_X3.4-1968) to UTF-32.
> This normally should not happen, please file a bug report.
> 0x7fa147668bd0
>         cert = <optimized out>
>         tls_c = 0x7fa14768f490
>         tls_log = <optimized out>
>         __func__ = "tls_accept"
> #14 0x00007fa14b0f7540 in tls_h_read_f (c=0x7fa1475c0dc0,
> flags=0x7ffebd9647d0) at tls_server.c:1153
>         r = 0x7fa1475c0ee8
>         bytes_free = 16383
>         bytes_read = <optimized out>
>         read_size = <optimized out>
>         ssl_error = 0
>         ssl_read = 0
>         ssl = 0x7fa147668bd0
>         rd_buf =
> "\026\003\003\004\016\v\000\004\n\000\004\a\000\004\0040\202\004\0000\202\002\350\240\003\002\001\002\002\002\026\0270\r\006\t*\206H\206\367\r\001\001\v\005\0000b1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003U\004\b\f\004Bern1\r0\v\006\003U\004\a\f\004Bern1\r0\v\006\003U\004\n\f\004RUAG1\0200\016\006\003U\004\v\f\aDefence1\0240\022\006\003U\004\003\f\versaimfs.ch0\036\027\r230302163519Z\027\r291230163519Z0`1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003"...
>         wr_buf =
> "\026\003\003\000A\002\000\000=\003\003\371\204\326\361\262-9\300\344gV\2316\373\235D\314\225u8\v}\215\254DOWNGRD\001\000\3000\000\000\025\377\001\000\001\000\000\v\000\004\003\000\001\002\000#\000\000\000\027\000\000\026\003\003\a\267\v\000\a\263\000\a\260\000\004\0040\202\004\0000\202\002\350\240\003\002\001\002\002\002\r>0\r\006\t*\206H\206\367\r\001\001\v\005\0000b1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003U\004\b\f\004Bern1\r0\v\006\003U\004\a\f\004Bern1\r0\v\006\003U\004\n\f\004RUAG1\0200\016\006\003U\004\v\f\aDefence1\0240\022\006"...
>         rd = {buf = 0x7ffebd944480 "\026\003\003\004\016\v", pos = 1405,
> used = 1405, size = 65536}
>         wr = {buf = 0x7ffebd954480 "\026\003\003", pos = 0, used = 0, size
> = 65536}
>         tls_c = 0x7fa14768f490
>         enc_rd_buf = <optimized out>
>         n = 0
>         flush_flags = 0
>         err_src = <optimized out>
>         ip_buf = '\000' <repeats 63 times>
>         x = <optimized out>
>         tls_dbg = <optimized out>
>         __func__ = "tls_h_read_f"
>         redo_read = <optimized out>
>         continue_ssl_read = <optimized out>
>         __llevel = <optimized out>
>         __kld = <optimized out>
>         __llevel = <optimized out>
>         __kld = <optimized out>
> #15 0x000055d2519871b4 in tcp_read_headers (c=c@entry=0x7fa1475c0dc0,
> read_flags=read_flags@entry=0x7ffebd9647d0) at core/tcp_read.c:445
>         bytes = <optimized out>
>         remaining = <optimized out>
>         p = <optimized out>
>         r = 0x7fa1475c0ee8
>         mc = <optimized out>
>         body_len = <optimized out>
>         tvnow = {tv_sec = 1742100857, tv_usec = 944923}
>         tvdiff = <optimized out>
>         mfline = <optimized out>
>         mtransid = <optimized out>
>         __func__ = "tcp_read_headers"
> #16 0x000055d251989cef in tcp_read_req (con=0x7fa1475c0dc0,
> bytes_read=bytes_read@entry=0x7ffebd9647c8, 
> read_flags=read_flags@entry=0x7ffebd9647d0)
> at core/tcp_read.c:1509
>         bytes = <optimized out>
>         total_bytes = 0
>         resp = 1
>         size = <optimized out>
>         req = 0x7fa1475c0ee8
>         dst = {send_sock = 0x55d251a91020, to = {s = {sa_family = 52024,
> sa_data = "\257Q\322U\000\000\002\000\000\000\376\177\000"}, sin =
> {sin_family = 52024, sin_port = 20911, sin_addr = {s_addr = 21970},
> sin_zero = "\002\000\000\000\376\177\000"}, sin6 = {sin6_family = 52024,
> sin6_port = 20911, sin6_flowinfo = 21970, sin6_addr = {__in6_u =
> {__u6_addr8 =
> "\002\000\000\000\376\177\000\000\030\000\000\000\000\000\000", __u6_addr16
> = {2, 0, 32766, 0, 24, 0, 0, 0}, __u6_addr32 = {2, 32766, 24, 0}}},
> sin6_scope_id = 0}, sas = {ss_family = 52024, __ss_padding =
> "\257Q\322U\000\000\002\000\000\000\376\177\000\000\030", '\000' <repeats
> 23 times>,
> "\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\262+\226Q\001
> \000\000\200,5L\241\177\000\000 \020\251Q\322U\000\000
> \020\251Q\322U\000\000\200\303\257Q\322U\000\00009SG\241\177\000\000\330G\226\275\376\177\000\000\b\000\000\000\000\000\000",
> __ss_align = 20}}, id = 1, send_flags = {f = 1, blst_imask = 0}, proto = 14
> '\016', proto_pad0 = 0 '\000', proto_pad1 = 0}
>         c = <optimized out>
>         ret = <optimized out>
>         again = <optimized out>
>         __func__ = "tcp_read_req"
> #17 0x000055d25198ebec in handle_io (fm=fm@entry=0x7fa14c352c80,
> events=events@entry=1, idx=idx@entry=-1) at core/tcp_read.c:1906
>         ret = 8
>         n = 0
>         read_flags = RD_CONN_SHORT_READ
>         con = 0x7fa1475c0dc0
>         s = 14
>         resp = <optimized out>
>         t = <optimized out>
>         ee = 0x0
>         __func__ = "handle_io"
>         error = <optimized out>
> #18 0x000055d25199469b in io_wait_loop_epoll (repeat=repeat@entry=0, t=2,
> h=0x55d251c49d60 <io_w>) at core/io_wait.h:1075
>         n = 1
>         r = 0
>         fm = 0x7fa14c352c80
>         revents = 1
>         __func__ = "io_wait_loop_epoll"
> #19 0x000055d251994e87 in tcp_receive_loop (unix_sock=<optimized out>) at
> core/tcp_read.c:2026
>         __func__ = "tcp_receive_loop"
> #20 0x000055d25197e72c in tcp_init_children 
> (woneinit=woneinit@entry=0x7ffebd964d1c)
> at core/tcp_main.c:5391
>         r = <optimized out>
>         i = <optimized out>
>         reader_fd_1 = 39
>         pid = <optimized out>
>         si_desc = "tcp receiver
> (generic)\000\000`M\226\275\376\177\000\000\001\000\000\000\000\000\000\000\210\035\223Q\322U\000\000
> \020\251Q\322U\000\000\220\315\257Q\322U\000\000\220\242/L\241\177\000\000\025\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000%\020\251Q\322U\000\000HU\257Q\322U\000\000e\005\000\000\000\000\000\000
> \020\251Q\322U\000\000\3750\235Q\322U\000"
>         si = <optimized out>
>         __func__ = "tcp_init_children"
>         error = <optimized out>
> #21 0x000055d2517c6c7e in main_loop () at main.c:1950
>         i = <optimized out>
>         pid = <optimized out>
>         si = <optimized out>
>         si_desc = "udp receiver child=7 
> sock=10.0.90.1:5060\000\000\000\000\322U\000\000?\342\017K\241\177\000\000\022\000\000\000\377\377\377\377\200\006\265L\241\177\000\000\253m\237L\241\177\000\000\361|\003\000\000\000\000\000\t\000\000\000\000\000\000\000Sat
> Mar 15 05:25:32 2025\n\000\264L\241\177\000"
>         nrprocs = <optimized out>
>         woneinit = 1
>         __func__ = "main_loop"
>         error = <optimized out>
> #22 0x000055d2517bc151 in main (argc=<optimized out>, argv=<optimized
> out>) at main.c:3256
>         cfg_stream = <optimized out>
>         c = <optimized out>
>         r = <optimized out>
>         tmp = 0x7ffebd966e7f ""
>         tmp_len = 32673
>         port = 5060
>         proto = 0
>         aproto = 0
>         ahost = 0x0
>         aport = 0
>         options = 0x55d251aa74b8
> ":f:cm:M:dVIhEeb:B:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:"
>         ret = -1
>         seed = 3209119183
>         rfd = <optimized out>
>         debug_save = <optimized out>
>         debug_flag = <optimized out>
>         dont_fork_cnt = 0
>         n_lst = <optimized out>
>         p = <optimized out>
>         st = {st_dev = 24, st_ino = 46237, st_nlink = 2, st_mode = 16888,
> st_uid = 899, st_gid = 899, __pad0 = 0, st_rdev = 0, st_size = 40,
> st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1742012732, tv_nsec =
> 217328715}, st_mtim = {tv_sec = 1742012732, tv_nsec = 217328715}, st_ctim =
> {tv_sec = 1742012732, tv_nsec = 217328715}, __glibc_reserved = {0, 0, 0}}
>         l1 = <optimized out>
>         tbuf = '\000' <repeats 32 times>,
> "\030\202\237\275\376\177\000\000@\002\000\000@\003\000\000\001", '\000'
> <repeats 23 times>, "@\003\000\000@\003\000\000@\003\000\000@\003\000\000@
> \003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@
> \003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@
> \003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@
> \003\000\000@\003\000\000@\003", '\000' <repeats 11 times>, "\001",
> '\000' <repeats 118 times>...
>         option_index = 12
>         long_options = {{name = 0x55d251a9230b "help", has_arg = 0, flag =
> 0x0, val = 104}, {name = 0x55d251a937a4 "version", has_arg = 0, flag = 0x0,
> val = 118}, {name = 0x55d251a9826a "alias", has_arg = 1, flag = 0x0, val =
> 1024}, {name = 0x55d251a92310 "subst", has_arg = 1, flag = 0x0, val =
> 1025}, {name = 0x55d251a92316 "substdef", has_arg = 1, flag = 0x0, val =
> 1026}, {name = 0x55d251a9231f "substdefs", has_arg = 1, flag = 0x0, val =
> 1027}, {name = 0x55d251a92329 "server-id", has_arg = 1, flag = 0x0, val =
> 1028}, {name = 0x55d251a92333 "loadmodule", has_arg = 1, flag = 0x0, val =
> 1029}, {name = 0x55d251a9233e "modparam", has_arg = 1, flag = 0x0, val =
> 1030}, {name = 0x55d251a92347 "log-engine", has_arg = 1, flag = 0x0, val =
> 1031}, {name = 0x55d251a938c1 "debug", has_arg = 1, flag = 0x0, val =
> 1032}, {name = 0x55d251a92352 "cfg-print", has_arg = 0, flag = 0x0, val =
> 1033}, {name = 0x55d251a9235c "atexit", has_arg = 1, flag = 0x0, val =
> 1034}, {name = 0x55d251a92363 "all-errors", has_arg = 0, flag = 0x0, val =
> 1035}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}}
>         __func__ = "main"
> [Inferior 1 (process 1782236) detached]
> ---end 1782236 -------------------------------------------------------
>
> ```
>
>
>
> Does anybody have an idea what this could be? I never saw that with
> previous versions, and now we can see it on several instances, but we are
> yet unable to reproduce.
>
>
>
> Not sure if this might be related, but we also see many messages like `Mar
> 17 13:39:03 ttel /usr/sbin/kamailio[2882127]: CRITICAL: <core>
> [core/tcp_main.c:5558]: tcp_timer_check_connections(): message processing
> timeout on connection id: 402 (state: 0) - closing`
>
>
>
> Best regards
>
> Mathias
>
__________________________________________________________
Kamailio - Users Mailing List - Non Commercial Discussions -- 
[email protected]
To unsubscribe send an email to [email protected]
Important: keep the mailing list in the recipients, do not reply only to the 
sender!

Reply via email to