> I'll also give a try with 6.0.1 later. Meanwhile, I've tested 5.8.6 and 6.0.1 and both show the same issue.
Am Do., 20. März 2025 um 10:21 Uhr schrieb Mathias Schneuwly < [email protected]>: > Hi Henning > > > Maybe also check the actual limits by “cat /proc/$kamailio-pid/limits” > Not 100% if its related, but if you are looking to scale this would be for > sure required. > The limit was > Max open files 1024 524288 files > and I changed it now to > Max open files 65536 65536 files > > Are there some recommendations for the limit? Would it make sense to > change that in the provided systemd file if you say that 1024 is not that > much? > > But anyway, it fails with 1024 and with 65536. > > I've also updated to 5.8.5 and it shows the same issue. I'll also give a > try with 6.0.1 later. > > > About the tls_thread_mode – you should use mode 2 if you are running > with openssl 3.x, otherwise you probably will observe random crashes. > We are running OpenSSL 3.2.3 and tls_thread_mode is set to 2. But it still > blocks. > > Best regards > Mathias > > Am Do., 20. März 2025 um 08:42 Uhr schrieb Henning Westerholt < > [email protected]>: > >> Hello Mathias, >> >> >> >> 1024 open files are not that much, you should increase it for the >> kamailio user. Maybe also check the actual limits by “cat >> /proc/$kamailio-pid/limits” >> >> Not 100% if its related, but if you are looking to scale this would be >> for sure required. >> >> >> >> About the tls_thread_mode – you should use mode 2 if you are running with >> openssl 3.x, otherwise you probably will observe random crashes. >> >> >> >> If you are seeing the mentioned problem with 15 phones, this is clearly a >> problem and should be further investigated. >> >> >> >> Cheers, >> >> >> >> Henning >> >> >> >> >> >> *From:* Mathias Schneuwly via sr-users <[email protected]> >> *Sent:* Donnerstag, 20. März 2025 08:07 >> *To:* [email protected] >> *Cc:* Mathias Schneuwly <[email protected]> >> *Subject:* [SR-Users] Re: send_fd(): sendmsg failed sending 55 on 52: >> Too many references: cannot splice (109) >> >> >> >> Hi >> >> >> >> Thanks for the reply. We are using the following limits and I guess they >> are the default values >> >> real-time non-blocking time (microseconds, -R) unlimited >> core file size (blocks, -c) unlimited >> data seg size (kbytes, -d) unlimited >> scheduling priority (-e) 0 >> file size (blocks, -f) unlimited >> pending signals (-i) 15666 >> max locked memory (kbytes, -l) 8192 >> max memory size (kbytes, -m) unlimited >> open files (-n) 1024 >> pipe size (512 bytes, -p) 8 >> POSIX message queues (bytes, -q) 819200 >> real-time priority (-r) 0 >> stack size (kbytes, -s) 8192 >> cpu time (seconds, -t) unlimited >> max user processes (-u) 15666 >> virtual memory (kbytes, -v) unlimited >> file locks (-x) unlimited >> >> >> >> But to me, it looks more like a deadlock somewhere around SSL_accept. >> According to the stack trace generated with "kamctl trap", all 8 TCP >> receiver processes are waiting in "#1 0x00007fa14ca04830 in >> pthread_rwlock_wrlock () from /usr/lib/libc.so.6". I guess the message "Too >> many references: cannot splice (109)" is just the consequence as the main >> process is no longer able to dispatch jobs to the child workers. >> >> >> >> Meanwhile, I'm able to reproduce the issue. It requires a bit of brute >> force, but it deadlocks reliably after a few seconds. >> >> >> >> I'm flooding Kamailio with register messages using sipexer. I'm running >> the following command 3 times in parallel and always after a few seconds, >> Kamailio is completely blocked. >> >> watch -n 0.1 "./sipexer -register -vl 3 -co -com -ex 3600 -fu "2" -au >> "2" -ap "test" -tk <path to key>.key -tc <path to cert>.cer -ti -sd -ru >> sip:<domain> -cu sip:2@<domain> tls:<fqdn>:5061;sleep 0.1;./sipexer >> -register -vl 3 -co -com -ex 0 -fu "2" -au "2" -ap "test" -tk <path to >> key>.key -tc <path to cert>.cer -ti -sd -ru sip:<domain> -cu >> sip:2@<domain> tls:<fqdn>:5061" >> >> >> >> I played a bit with the parameters "modparam("tls", "init_mode", 1)" and >> "tls_threads_mode = 2", but it didn't seem to change anything. >> >> >> >> Please have in mind that we usually do not stress Kamailio that much. We >> have seen the issue a couple of times with around 5 to 15 SIP phones >> registered. But in this usual setup, the problem does not occur reliably. >> >> >> >> Best regards >> >> Mathias >> >> >> >> Am Di., 18. März 2025 um 21:53 Uhr schrieb Henning Westerholt < >> [email protected]>: >> >> Hello, >> >> >> >> check the ulimit settings for the kamailio user and general resource >> usage on that system. In development branch there is also a feature present >> to print the actual ulimits on startup. >> >> >> >> Cheers, >> >> >> >> Henning >> >> >> >> *From:* Mathias Schneuwly via sr-users <[email protected]> >> *Sent:* Montag, 17. März 2025 13:41 >> *To:* [email protected] >> *Cc:* Mathias Schneuwly <[email protected]> >> *Subject:* [SR-Users] send_fd(): sendmsg failed sending 55 on 52: Too >> many references: cannot splice (109) >> >> >> >> Hi >> >> >> >> We are using kamailio 5.8.4. >> >> >> >> Recently, we noticed, that after a few days, we see the following >> messages from Kamailio `2025 Mar 17 10:22:11 ttel CRITICAL: <core> >> [core/pass_fd.c:193]: send_fd(): sendmsg failed sending 55 on 52: Too many >> references: cannot splice (109)`. >> >> >> >> When this happens, devices are no longer able to register. At the moment >> we have no idea what might be wrong. >> >> >> >> Unfortunately, the internet does not reveal much information regarding >> this issue. In one issue related to Kamailio ( >> https://lists.kamailio.org/mailman3/hyperkitty/list/[email protected]/thread/FKN25CGV27KA3GTS3ED3E7V3WKRZALRP/), >> I found the command `kamctl trap` and when I execute that, I see one issue, >> which might be related, but I'm just guessing... >> >> >> >> ``` >> >> ---start 1782236 ----------------------------------------------------- >> [Thread debugging using libthread_db enabled] >> Using host libthread_db library "/usr/lib/libthread_db.so.1". >> 0x00007fa14c9fb1c6 in ?? () from /usr/lib/libc.so.6 >> $1 = 26 >> $2 = {pid = 1782236, unix_sock = 36, idx = 0, status = 1, rank = 17, desc >> = "tcp receiver (generic) child=0", '\000' <repeats 97 times>} >> #0 0x00007fa14c9fb1c6 in ?? () from /usr/lib/libc.so.6 >> No symbol table info available. >> #1 0x00007fa14ca04830 in pthread_rwlock_wrlock () from /usr/lib/libc.so.6 >> No symbol table info available. >> #2 0x00007fa14ac20b49 in CRYPTO_THREAD_write_lock () from >> /usr/lib/libcrypto.so.3 >> No symbol table info available. >> #3 0x00007fa14adbd5da in ?? () from /usr/lib/libcrypto.so.3 >> No symbol table info available. >> #4 0x00007fa14adbd9ca in ?? () from /usr/lib/libcrypto.so.3 >> No symbol table info available. >> #5 0x00007fa14adbc907 in ?? () from /usr/lib/libcrypto.so.3 >> No symbol table info available. >> #6 0x00007fa14adbd26c in ?? () from /usr/lib/libcrypto.so.3 >> No symbol table info available. >> #7 0x00007fa14adbd68e in ?? () from /usr/lib/libcrypto.so.3 >> No symbol table info available. >> #8 0x00007fa14abe75c6 in ?? () from /usr/lib/libcrypto.so.3 >> No symbol table info available. >> #9 0x00007fa14abe840b in EVP_RAND_generate () from >> /usr/lib/libcrypto.so.3 >> No symbol table info available. >> #10 0x00007fa14acf9282 in RAND_bytes_ex () from /usr/lib/libcrypto.so.3 >> No symbol table info available. >> #11 0x00007fa14b06cce8 in ?? () from /usr/lib/libssl.so.3 >> No symbol table info available. >> #12 0x00007fa14b05bdf7 in ?? () from /usr/lib/libssl.so.3 >> No symbol table info available. >> #13 0x00007fa14b0ef036 in tls_accept (c=c@entry=0x7fa1475c0dc0, >> error=error@entry=0x7ffebd9443b8) at tls_server.c:457 >> ret = <optimized out> >> ssl = warning: could not convert 'ssl_st' from the host encoding >> (ANSI_X3.4-1968) to UTF-32. >> This normally should not happen, please file a bug report. >> 0x7fa147668bd0 >> cert = <optimized out> >> tls_c = 0x7fa14768f490 >> tls_log = <optimized out> >> __func__ = "tls_accept" >> #14 0x00007fa14b0f7540 in tls_h_read_f (c=0x7fa1475c0dc0, >> flags=0x7ffebd9647d0) at tls_server.c:1153 >> r = 0x7fa1475c0ee8 >> bytes_free = 16383 >> bytes_read = <optimized out> >> read_size = <optimized out> >> ssl_error = 0 >> ssl_read = 0 >> ssl = 0x7fa147668bd0 >> rd_buf = >> "\026\003\003\004\016\v\000\004\n\000\004\a\000\004\0040\202\004\0000\202\002\350\240\003\002\001\002\002\002\026\0270\r\006\t*\206H\206\367\r\001\001\v\005\0000b1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003U\004\b\f\004Bern1\r0\v\006\003U\004\a\f\004Bern1\r0\v\006\003U\004\n\f\004RUAG1\0200\016\006\003U\004\v\f\aDefence1\0240\022\006\003U\004\003\f\versaimfs.ch0\036\027\r230302163519Z\027\r291230163519Z0`1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003"... >> wr_buf = >> "\026\003\003\000A\002\000\000=\003\003\371\204\326\361\262-9\300\344gV\2316\373\235D\314\225u8\v}\215\254DOWNGRD\001\000\3000\000\000\025\377\001\000\001\000\000\v\000\004\003\000\001\002\000#\000\000\000\027\000\000\026\003\003\a\267\v\000\a\263\000\a\260\000\004\0040\202\004\0000\202\002\350\240\003\002\001\002\002\002\r>0\r\006\t*\206H\206\367\r\001\001\v\005\0000b1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003U\004\b\f\004Bern1\r0\v\006\003U\004\a\f\004Bern1\r0\v\006\003U\004\n\f\004RUAG1\0200\016\006\003U\004\v\f\aDefence1\0240\022\006"... >> rd = {buf = 0x7ffebd944480 "\026\003\003\004\016\v", pos = 1405, >> used = 1405, size = 65536} >> wr = {buf = 0x7ffebd954480 "\026\003\003", pos = 0, used = 0, >> size = 65536} >> tls_c = 0x7fa14768f490 >> enc_rd_buf = <optimized out> >> n = 0 >> flush_flags = 0 >> err_src = <optimized out> >> ip_buf = '\000' <repeats 63 times> >> x = <optimized out> >> tls_dbg = <optimized out> >> __func__ = "tls_h_read_f" >> redo_read = <optimized out> >> continue_ssl_read = <optimized out> >> __llevel = <optimized out> >> __kld = <optimized out> >> __llevel = <optimized out> >> __kld = <optimized out> >> #15 0x000055d2519871b4 in tcp_read_headers (c=c@entry=0x7fa1475c0dc0, >> read_flags=read_flags@entry=0x7ffebd9647d0) at core/tcp_read.c:445 >> bytes = <optimized out> >> remaining = <optimized out> >> p = <optimized out> >> r = 0x7fa1475c0ee8 >> mc = <optimized out> >> body_len = <optimized out> >> tvnow = {tv_sec = 1742100857, tv_usec = 944923} >> tvdiff = <optimized out> >> mfline = <optimized out> >> mtransid = <optimized out> >> __func__ = "tcp_read_headers" >> #16 0x000055d251989cef in tcp_read_req (con=0x7fa1475c0dc0, >> bytes_read=bytes_read@entry=0x7ffebd9647c8, >> read_flags=read_flags@entry=0x7ffebd9647d0) >> at core/tcp_read.c:1509 >> bytes = <optimized out> >> total_bytes = 0 >> resp = 1 >> size = <optimized out> >> req = 0x7fa1475c0ee8 >> dst = {send_sock = 0x55d251a91020, to = {s = {sa_family = 52024, >> sa_data = "\257Q\322U\000\000\002\000\000\000\376\177\000"}, sin = >> {sin_family = 52024, sin_port = 20911, sin_addr = {s_addr = 21970}, >> sin_zero = "\002\000\000\000\376\177\000"}, sin6 = {sin6_family = 52024, >> sin6_port = 20911, sin6_flowinfo = 21970, sin6_addr = {__in6_u = >> {__u6_addr8 = >> "\002\000\000\000\376\177\000\000\030\000\000\000\000\000\000", __u6_addr16 >> = {2, 0, 32766, 0, 24, 0, 0, 0}, __u6_addr32 = {2, 32766, 24, 0}}}, >> sin6_scope_id = 0}, sas = {ss_family = 52024, __ss_padding = >> "\257Q\322U\000\000\002\000\000\000\376\177\000\000\030", '\000' <repeats >> 23 times>, >> "\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\262+\226Q\001 >> \000\000\200,5L\241\177\000\000 \020\251Q\322U\000\000 >> \020\251Q\322U\000\000\200\303\257Q\322U\000\00009SG\241\177\000\000\330G\226\275\376\177\000\000\b\000\000\000\000\000\000", >> __ss_align = 20}}, id = 1, send_flags = {f = 1, blst_imask = 0}, proto = 14 >> '\016', proto_pad0 = 0 '\000', proto_pad1 = 0} >> c = <optimized out> >> ret = <optimized out> >> again = <optimized out> >> __func__ = "tcp_read_req" >> #17 0x000055d25198ebec in handle_io (fm=fm@entry=0x7fa14c352c80, >> events=events@entry=1, idx=idx@entry=-1) at core/tcp_read.c:1906 >> ret = 8 >> n = 0 >> read_flags = RD_CONN_SHORT_READ >> con = 0x7fa1475c0dc0 >> s = 14 >> resp = <optimized out> >> t = <optimized out> >> ee = 0x0 >> __func__ = "handle_io" >> error = <optimized out> >> #18 0x000055d25199469b in io_wait_loop_epoll (repeat=repeat@entry=0, >> t=2, h=0x55d251c49d60 <io_w>) at core/io_wait.h:1075 >> n = 1 >> r = 0 >> fm = 0x7fa14c352c80 >> revents = 1 >> __func__ = "io_wait_loop_epoll" >> #19 0x000055d251994e87 in tcp_receive_loop (unix_sock=<optimized out>) at >> core/tcp_read.c:2026 >> __func__ = "tcp_receive_loop" >> #20 0x000055d25197e72c in tcp_init_children >> (woneinit=woneinit@entry=0x7ffebd964d1c) >> at core/tcp_main.c:5391 >> r = <optimized out> >> i = <optimized out> >> reader_fd_1 = 39 >> pid = <optimized out> >> si_desc = "tcp receiver >> (generic)\000\000`M\226\275\376\177\000\000\001\000\000\000\000\000\000\000\210\035\223Q\322U\000\000 >> \020\251Q\322U\000\000\220\315\257Q\322U\000\000\220\242/L\241\177\000\000\025\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000%\020\251Q\322U\000\000HU\257Q\322U\000\000e\005\000\000\000\000\000\000 >> \020\251Q\322U\000\000\3750\235Q\322U\000" >> si = <optimized out> >> __func__ = "tcp_init_children" >> error = <optimized out> >> #21 0x000055d2517c6c7e in main_loop () at main.c:1950 >> i = <optimized out> >> pid = <optimized out> >> si = <optimized out> >> si_desc = "udp receiver child=7 >> sock=10.0.90.1:5060\000\000\000\000\322U\000\000?\342\017K\241\177\000\000\022\000\000\000\377\377\377\377\200\006\265L\241\177\000\000\253m\237L\241\177\000\000\361|\003\000\000\000\000\000\t\000\000\000\000\000\000\000Sat >> Mar 15 05:25:32 2025\n\000\264L\241\177\000" >> nrprocs = <optimized out> >> woneinit = 1 >> __func__ = "main_loop" >> error = <optimized out> >> #22 0x000055d2517bc151 in main (argc=<optimized out>, argv=<optimized >> out>) at main.c:3256 >> cfg_stream = <optimized out> >> c = <optimized out> >> r = <optimized out> >> tmp = 0x7ffebd966e7f "" >> tmp_len = 32673 >> port = 5060 >> proto = 0 >> aproto = 0 >> ahost = 0x0 >> aport = 0 >> options = 0x55d251aa74b8 >> ":f:cm:M:dVIhEeb:B:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:" >> ret = -1 >> seed = 3209119183 >> rfd = <optimized out> >> debug_save = <optimized out> >> debug_flag = <optimized out> >> dont_fork_cnt = 0 >> n_lst = <optimized out> >> p = <optimized out> >> st = {st_dev = 24, st_ino = 46237, st_nlink = 2, st_mode = 16888, >> st_uid = 899, st_gid = 899, __pad0 = 0, st_rdev = 0, st_size = 40, >> st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1742012732, tv_nsec = >> 217328715}, st_mtim = {tv_sec = 1742012732, tv_nsec = 217328715}, st_ctim = >> {tv_sec = 1742012732, tv_nsec = 217328715}, __glibc_reserved = {0, 0, 0}} >> l1 = <optimized out> >> tbuf = '\000' <repeats 32 times>, >> "\030\202\237\275\376\177\000\000@\002\000\000@\003\000\000\001", '\000' >> <repeats 23 times>, "@\003\000\000@\003\000\000@\003\000\000@ >> \003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@ >> \003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@ >> \003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@ >> \003\000\000@\003\000\000@\003\000\000@\003", '\000' <repeats 11 times>, >> "\001", '\000' <repeats 118 times>... >> option_index = 12 >> long_options = {{name = 0x55d251a9230b "help", has_arg = 0, flag >> = 0x0, val = 104}, {name = 0x55d251a937a4 "version", has_arg = 0, flag = >> 0x0, val = 118}, {name = 0x55d251a9826a "alias", has_arg = 1, flag = 0x0, >> val = 1024}, {name = 0x55d251a92310 "subst", has_arg = 1, flag = 0x0, val = >> 1025}, {name = 0x55d251a92316 "substdef", has_arg = 1, flag = 0x0, val = >> 1026}, {name = 0x55d251a9231f "substdefs", has_arg = 1, flag = 0x0, val = >> 1027}, {name = 0x55d251a92329 "server-id", has_arg = 1, flag = 0x0, val = >> 1028}, {name = 0x55d251a92333 "loadmodule", has_arg = 1, flag = 0x0, val = >> 1029}, {name = 0x55d251a9233e "modparam", has_arg = 1, flag = 0x0, val = >> 1030}, {name = 0x55d251a92347 "log-engine", has_arg = 1, flag = 0x0, val = >> 1031}, {name = 0x55d251a938c1 "debug", has_arg = 1, flag = 0x0, val = >> 1032}, {name = 0x55d251a92352 "cfg-print", has_arg = 0, flag = 0x0, val = >> 1033}, {name = 0x55d251a9235c "atexit", has_arg = 1, flag = 0x0, val = >> 1034}, {name = 0x55d251a92363 "all-errors", has_arg = 0, flag = 0x0, val = >> 1035}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} >> __func__ = "main" >> [Inferior 1 (process 1782236) detached] >> ---end 1782236 ------------------------------------------------------- >> >> ``` >> >> >> >> Does anybody have an idea what this could be? I never saw that with >> previous versions, and now we can see it on several instances, but we are >> yet unable to reproduce. >> >> >> >> Not sure if this might be related, but we also see many messages like >> `Mar 17 13:39:03 ttel /usr/sbin/kamailio[2882127]: CRITICAL: <core> >> [core/tcp_main.c:5558]: tcp_timer_check_connections(): message processing >> timeout on connection id: 402 (state: 0) - closing` >> >> >> >> Best regards >> >> Mathias >> >>
__________________________________________________________ Kamailio - Users Mailing List - Non Commercial Discussions -- [email protected] To unsubscribe send an email to [email protected] Important: keep the mailing list in the recipients, do not reply only to the sender!
