Hi Alexander, Quick reply.
Nice bug report! Agreed that it looks like vl_api_clnt_process sleeps, probably because it hit a queue size of 0, but memclnt_queue_callback or the timeout, albeit 20s is a lot, should wake it up. So, given that QUEUE_SIGNAL_EVENT is set, the only thing that comes to mind is that maybe somehow vlib_process_signal_event context gets corrupted. Could you run a debug image and see if anything asserts? Is vlib_process_signal_event called by chance from a worker? Regards, Florin > On Jan 24, 2023, at 7:59 AM, Alexander Chernavin via lists.fd.io > <achernavin=netgate....@lists.fd.io> wrote: > > Hello all, > > We are experiencing VPP crashes that occur a few days after the startup > because of API segment exhaustion. Increasing API segment size to 256MB > didn't stop the crashes from occurring. > > Can you please take a look at the description below and tell us if you have > seen similar issues or have any ideas what the cause may be? > > Given: > VPP 22.10 > 2 worker threads > API segment size is 256MB > ~893k IPv4 routes and ~160k IPv6 routes added > > Backtrace: >> [..] >> #32660 0x000055b02f606896 in os_panic () at >> /home/jenkins/tnsr-pkgs/work/vpp/src/vpp/vnet/main.c:414 >> #32661 0x00007fce3c0ec740 in clib_mem_heap_alloc_inline (heap=0x0, >> size=<optimized out>, align=8, >> os_out_of_memory_on_failure=1) at >> /home/jenkins/tnsr-pkgs/work/vpp/src/vppinfra/mem_dlmalloc.c:613 >> #32662 clib_mem_alloc (size=<optimized out>) >> at /home/jenkins/tnsr-pkgs/work/vpp/src/vppinfra/mem_dlmalloc.c:628 >> #32663 0x00007fce3dc4ee6f in vl_msg_api_alloc_internal (vlib_rp=0x130026000, >> nbytes=69, pool=0, >> may_return_null=0) at >> /home/jenkins/tnsr-pkgs/work/vpp/src/vlibmemory/memory_shared.c:179 >> #32664 0x00007fce3dc592cd in vl_api_rpc_call_main_thread_inline (force_rpc=0 >> '\000', >> fp=<optimized out>, data=<optimized out>, data_length=<optimized out>) >> at /home/jenkins/tnsr-pkgs/work/vpp/src/vlibmemory/memclnt_api.c:617 >> #32665 vl_api_rpc_call_main_thread (fp=0x7fce3c74de70 <ip_neighbor_learn>, >> data=0x7fcc372bdc00 "& \001$ ", data_length=28) >> at /home/jenkins/tnsr-pkgs/work/vpp/src/vlibmemory/memclnt_api.c:641 >> #32666 0x00007fce3cc7fe2d in icmp6_neighbor_solicitation_or_advertisement >> (vm=0x7fccc0864000, >> frame=0x7fcccd7d2d40, is_solicitation=1, node=<optimized out>) >> at /home/jenkins/tnsr-pkgs/work/vpp/src/vnet/ip6-nd/ip6_nd.c:163 >> #32667 icmp6_neighbor_solicitation (vm=0x7fccc0864000, node=0x7fccc09e3380, >> frame=0x7fcccd7d2d40) >> at /home/jenkins/tnsr-pkgs/work/vpp/src/vnet/ip6-nd/ip6_nd.c:322 >> #32668 0x00007fce3c1a2fe0 in dispatch_node (vm=0x7fccc0864000, >> node=0x7fce3dc74836, >> type=VLIB_NODE_TYPE_INTERNAL, dispatch_state=VLIB_NODE_STATE_POLLING, >> frame=0x7fcccd7d2d40, >> last_time_stamp=4014159654296481) at >> /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/main.c:961 >> #32669 dispatch_pending_node (vm=0x7fccc0864000, pending_frame_index=7, >> last_time_stamp=4014159654296481) at >> /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/main.c:1120 >> #32670 vlib_main_or_worker_loop (vm=0x7fccc0864000, is_main=0) >> at /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/main.c:1589 >> #32671 vlib_worker_loop (vm=vm@entry=0x7fccc0864000) >> at /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/main.c:1723 >> #32672 0x00007fce3c1f581a in vlib_worker_thread_fn (arg=0x7fccbdb11b40) >> at /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/threads.c:1579 >> #32673 0x00007fce3c1f02c1 in vlib_worker_thread_bootstrap_fn >> (arg=0x7fccbdb11b40) >> at /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/threads.c:418 >> #32674 0x00007fce3be3db43 in start_thread (arg=<optimized out>) at >> ./nptl/pthread_create.c:442 >> #32675 0x00007fce3becfa00 in clone3 () at >> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 > > According to the backtrace, an IPv6 neighbor is being learned. Since the > packet was received on a worker thread, the neighbor information is being > passed to the main thread by making an RPC call (that works via the API). For > this, an API message for RPC call is being allocated from the API segment (as > а client). But the allocation is failing because of no available memory. > > If inspect the API rings after crashing, it can be seen that they are all > filled with VL_API_RPC_CALL messages. Also, it can be seen that there are a > lot of pending RPC requests (vm->pending_rpc_requests has ~3.3M items). Thus, > API segment exhaustion occurs because of a huge number of pending RPC > messages. > > RPC messages are processed in a process node called api-rx-from-ring > (function is called vl_api_clnt_process). And process nodes are handled in > the main thread only. > > First hypothesis is that the main loop of the main thread pauses for such a > long time that a huge number of pending RPC messages are accumulated by the > worker threads (that keep running). But this doesn't seem to be confirmed if > inspect vm->loop_interval_start of all threads after crashing. > vm->loop_interval_start of the worker threads would have been greater than > vm->loop_interval_start of the main thread. >> (gdb) p vlib_global_main.vlib_mains[0]->loop_interval_start >> $117 = 197662.55595008997 >> (gdb) p vlib_global_main.vlib_mains[1]->loop_interval_start >> $119 = 197659.82887979984 >> (gdb) p vlib_global_main.vlib_mains[2]->loop_interval_start >> $121 = 197659.93944517447 > > Second hypothesis is that pending RPC messages stop being processed > completely at some point and keep being accumulated while the memory permits. > This seems to be confirmed if inspect the process node after crashing. It can > be seen that vm->main_loop_count is much bigger than the process node's > main_loop_count_last_dispatch (difference is ~50M iterations). Although, > according to the flags, the node is waiting for timer or an event and it was > suspended for 20s. Also, vm->queue_signal_pending is set and this means that > QUEUE_SIGNAL_EVENT was signaled and the process node was supposed to be > executed shortly. But such a big difference since last execution of the node > seems to be indicating that it wasn't executed by timer either. >> (gdb) p vlib_global_main.vlib_mains[0]->main_loop_count >> $115 = 1696611558 >> (gdb) p vlib_global_main.vlib_mains[0]->queue_signal_pending >> $116 = 1 >> (gdb) p *vlib_global_main.vlib_mains[0]->node_main.processes[26] >> $122 = {[...], function = 0x7fce3dc57260 <vl_api_clnt_process>, >> [...], main_loop_count_last_dispatch = 1646632623, >> [...], flags = 11, >> [...], resume_clock_interval = 2000000, >> [...]} > > What is also worth noting is that if look what stop_timer_handle the process > node keeps, it can be seen that the timer doesn't belong to it. In fact, it > belongs to another process node. It might look like the timer was reused. And > if compare the values of main_loop_count_last_dispatch, after one crash, it > does look like a reuse but after another one, it doesn't. For example, in > this case, it doesn't look like a reuse because 1,646,632,623 > 1,552,541,082. >> (gdb) p *vlib_global_main.vlib_mains[0]->node_main.processes[26] >> $254 = {[...], function = 0x7fce3dc57260 <vl_api_clnt_process>, >> [...], main_loop_count_last_dispatch = 1646632623, >> [...], stop_timer_handle = 3087, >> [...]} >> (gdb) p >> ((tw_timer_wheel_1t_3w_1024sl_ov_t*)vlib_global_main.vlib_mains[0]->node_main.timing_wheel)->timers[3087] >> $255 = {next = 3095, prev = 3074, {{fast_ring_offset = 287, slow_ring_offset >> = 356}, expiration_time = 18446744069437915423}, user_handle = 84} >> (gdb) p *vlib_global_main.vlib_mains[0]->node_main.processes[42] >> $256 = {[...], function = 0x7fce3c638700 <ip4_sv_reass_walk_expired>, >> [...], main_loop_count_last_dispatch = 1552541082, >> [...], stop_timer_handle = 3087, >> [...]} > > Regards, > Alexander > > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#22507): https://lists.fd.io/g/vpp-dev/message/22507 Mute This Topic: https://lists.fd.io/mt/96500275/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-