Re: [vpp-dev] VPP crashes because of API segment exhaustion
Hello Florin, > > Agreed that it looks like vl_api_clnt_process sleeps, probably because it > hit a queue size of 0, but memclnt_queue_callback or the timeout, albeit > 20s is a lot, should wake it up. It doesn't look like vl_api_clnt_process would have woken up later. Firstly, because QUEUE_SIGNAL_EVENT was already signaled and vm->queue_signal_pending was set. And memclnt_queue_callback() is only triggered if vm->queue_signal_pending is unset. Thus, no new calls of memclnt_queue_callback() would have happened while vm->queue_signal_pending was set. Secondly, the timer id that vl_api_clnt_process holds belongs to another process node. Even if the timer was valid, the other process node would have been triggered by it. > > So, given that QUEUE_SIGNAL_EVENT is set, the only thing that comes to > mind is that maybe somehow vlib_process_signal_event context gets > corrupted. Could you run a debug image and see if anything asserts? Is > vlib_process_signal_event called by chance from a worker? It's problematic to run a debug version of VPP on the affected instances. There are no signs of vlib_process_signal_event() being called from a worker thread. If look at memclnt_queue_callback(), it is called only in the main thread. Regards, Alexander -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#22508): https://lists.fd.io/g/vpp-dev/message/22508 Mute This Topic: https://lists.fd.io/mt/96500275/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
[vpp-dev] VPP crashes because of API segment exhaustion
Hello all, We are experiencing VPP crashes that occur a few days after the startup because of API segment exhaustion. Increasing API segment size to 256MB didn't stop the crashes from occurring. Can you please take a look at the description below and tell us if you have seen similar issues or have any ideas what the cause may be? Given: - VPP 22.10 - 2 worker threads - API segment size is 256MB - ~893k IPv4 routes and ~160k IPv6 routes added Backtrace: > [..] > #32660 0x55b02f606896 in os_panic () at > /home/jenkins/tnsr-pkgs/work/vpp/src/vpp/vnet/main.c:414 > #32661 0x7fce3c0ec740 in clib_mem_heap_alloc_inline (heap=0x0, > size=, align=8, > os_out_of_memory_on_failure=1) at > /home/jenkins/tnsr-pkgs/work/vpp/src/vppinfra/mem_dlmalloc.c:613 > #32662 clib_mem_alloc (size=) > at /home/jenkins/tnsr-pkgs/work/vpp/src/vppinfra/mem_dlmalloc.c:628 > #32663 0x7fce3dc4ee6f in vl_msg_api_alloc_internal > (vlib_rp=0x130026000, nbytes=69, pool=0, > may_return_null=0) at > /home/jenkins/tnsr-pkgs/work/vpp/src/vlibmemory/memory_shared.c:179 > #32664 0x7fce3dc592cd in vl_api_rpc_call_main_thread_inline > (force_rpc=0 '\000', > fp=, data=, data_length=) > at /home/jenkins/tnsr-pkgs/work/vpp/src/vlibmemory/memclnt_api.c:617 > #32665 vl_api_rpc_call_main_thread (fp=0x7fce3c74de70 , > data=0x7fcc372bdc00 "& \001$ ", data_length=28) > at /home/jenkins/tnsr-pkgs/work/vpp/src/vlibmemory/memclnt_api.c:641 > #32666 0x7fce3cc7fe2d in icmp6_neighbor_solicitation_or_advertisement > (vm=0x7fccc0864000, > frame=0x7fcccd7d2d40, is_solicitation=1, node=) > at /home/jenkins/tnsr-pkgs/work/vpp/src/vnet/ip6-nd/ip6_nd.c:163 > #32667 icmp6_neighbor_solicitation (vm=0x7fccc0864000, > node=0x7fccc09e3380, frame=0x7fcccd7d2d40) > at /home/jenkins/tnsr-pkgs/work/vpp/src/vnet/ip6-nd/ip6_nd.c:322 > #32668 0x7fce3c1a2fe0 in dispatch_node (vm=0x7fccc0864000, > node=0x7fce3dc74836, > type=VLIB_NODE_TYPE_INTERNAL, dispatch_state=VLIB_NODE_STATE_POLLING, > frame=0x7fcccd7d2d40, > last_time_stamp=4014159654296481) at > /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/main.c:961 > #32669 dispatch_pending_node (vm=0x7fccc0864000, pending_frame_index=7, > last_time_stamp=4014159654296481) at > /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/main.c:1120 > #32670 vlib_main_or_worker_loop (vm=0x7fccc0864000, is_main=0) > at /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/main.c:1589 > #32671 vlib_worker_loop (vm=vm@entry=0x7fccc0864000) > at /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/main.c:1723 > #32672 0x7fce3c1f581a in vlib_worker_thread_fn (arg=0x7fccbdb11b40) > at /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/threads.c:1579 > #32673 0x7fce3c1f02c1 in vlib_worker_thread_bootstrap_fn > (arg=0x7fccbdb11b40) > at /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/threads.c:418 > #32674 0x7fce3be3db43 in start_thread (arg=) at > ./nptl/pthread_create.c:442 > #32675 0x7fce3becfa00 in clone3 () at > ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 > According to the backtrace, an IPv6 neighbor is being learned. Since the packet was received on a worker thread, the neighbor information is being passed to the main thread by making an RPC call (that works via the API). For this, an API message for RPC call is being allocated from the API segment (as а client). But the allocation is failing because of no available memory. If inspect the API rings after crashing, it can be seen that they are all filled with VL_API_RPC_CALL messages. Also, it can be seen that there are a lot of pending RPC requests (vm->pending_rpc_requests has ~3.3M items). Thus, API segment exhaustion occurs because of a huge number of pending RPC messages. RPC messages are processed in a process node called api-rx-from-ring (function is called vl_api_clnt_process). And process nodes are handled in the main thread only. First hypothesis is that the main loop of the main thread pauses for such a long time that a huge number of pending RPC messages are accumulated by the worker threads (that keep running). But this doesn't seem to be confirmed if inspect vm->loop_interval_start of all threads after crashing. vm->loop_interval_start of the worker threads would have been greater than vm->loop_interval_start of the main thread. > (gdb) p vlib_global_main.vlib_mains[0]->loop_interval_start > $117 = 197662.55595008997 > (gdb) p vlib_global_main.vlib_mains[1]->loop_interval_start > $119 = 197659.82887979984 > (gdb) p vlib_global_main.vlib_mains[2]->loop_interval_start > $121 = 197659.93944517447 > Second hypothesis is that pending RPC messages stop being processed completely at some point and keep being accumulated while the memory permits. This seems to be confirmed if inspect the process node after crashing. It can be seen that vm->main_loop_count is much bigger than the process node's main_loop_count_last_dispatch (difference is ~50M iterations). Although, according to the flags, the node is waiting for time
Re: [vpp-dev] NAT ED empty users dump #nat #nat44
Ole, OK, nat44_user_dump is not going to return anything in NAT ED. nat44_user_session_dump has required fields ( ip_address and vrf_id) that don't allow you to dump all the sessions. If make those fields optional, that should work. Addition of sort and limit optional fields is a good idea and might be helpful. Thanks, Alexander -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16356): https://lists.fd.io/g/vpp-dev/message/16356 Mute This Topic: https://lists.fd.io/mt/74156168/21656 Mute #nat44: https://lists.fd.io/mk?hashtag=nat44&subid=1480452 Mute #nat: https://lists.fd.io/mk?hashtag=nat&subid=1480452 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] NAT ED empty users dump #nat #nat44
Hello Ole, I'm not sure I get your question right. The use case is being able to see NAT pool utilization and debug NAT sessions. I think it's not a specific use case. NAT44 ED sessions: thread 0 vpp_main: 3 sessions i2o 10.255.10.100 proto icmp port 1593 fib 0 o2i 10.100.200.14 proto icmp port 16253 fib 0 external host 10.255.30.100:0 index 0 last heard 27.67 total pkts 8, total bytes 728 dynamic translation i2o 10.255.10.100 proto udp port 45177 fib 0 o2i 10.100.200.14 proto udp port 18995 fib 0 external host 10.255.30.100:8161 index 1 last heard 32.66 total pkts 2, total bytes 106 dynamic translation i2o 10.255.10.100 proto tcp port 59664 fib 0 o2i 10.100.200.14 proto tcp port 53893 fib 0 external host 10.255.30.100:22 index 2 last heard 36.64 total pkts 9, total bytes 635 dynamic translation The way I see it is that there was API that worked for ED and non ED NAT modes (except for deterministic). ED mode logic has changed but API remains the same. It still works for non ED NAT modes and has stopped working for ED mode. I think it's not consistent. Thanks, Alexander -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16352): https://lists.fd.io/g/vpp-dev/message/16352 Mute This Topic: https://lists.fd.io/mt/74156168/21656 Mute #nat44: https://lists.fd.io/mk?hashtag=nat44&subid=1480452 Mute #nat: https://lists.fd.io/mk?hashtag=nat&subid=1480452 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] NAT ED empty users dump #nat #nat44
Klement, I would prefer the existing API working. I expect millions of sessions and it's clear that dumping them all is a blocker but during debug, there are not so many of them. Thanks, Alexander -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16342): https://lists.fd.io/g/vpp-dev/message/16342 Mute This Topic: https://lists.fd.io/mt/74156168/21656 Mute #nat44: https://lists.fd.io/mk?hashtag=nat44&subid=1480452 Mute #nat: https://lists.fd.io/mk?hashtag=nat&subid=1480452 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] NAT ED empty users dump #nat #nat44
Klement, Basically print statistics and debug info: number of users, what user consumes what number of sessions, what session created for what communication. Thanks, Alexander -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16334): https://lists.fd.io/g/vpp-dev/message/16334 Mute This Topic: https://lists.fd.io/mt/74156168/21656 Mute #nat44: https://lists.fd.io/mk?hashtag=nat44&subid=1480452 Mute #nat: https://lists.fd.io/mk?hashtag=nat&subid=1480452 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] NAT ED empty users dump #nat #nat44
Hello Klement, I want to list all NAT sessions. In order to do that I used to call VL_API_NAT44_USER_DUMP. After that, I had all users, and I could call VL_API_NAT44_USER_SESSION_DUMP to get sessions for every user. Now VL_API_NAT44_USER_DUMP returns nothing in ED mode and I don't know what users are. At the same time, VL_API_NAT44_USER_SESSION_DUMP requires ip_address and vrf_id arguments. So if you don't know users, you cannot get sessions. Thanks, Alexander -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16330): https://lists.fd.io/g/vpp-dev/message/16330 Mute This Topic: https://lists.fd.io/mt/74156168/21656 Mute #nat44: https://lists.fd.io/mk?hashtag=nat44&subid=1480452 Mute #nat: https://lists.fd.io/mk?hashtag=nat&subid=1480452 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
[vpp-dev] NAT ED empty users dump #nat #nat44
Hello, As I understand the "users" concept has been removed from NAT ED and now vl_api_nat44_user_dump_t returns nothing in ED mode. vl_api_nat44_user_session_dump_t returns sessions only if you know the user you are requesting sessions for. But you can't get the user list. Therefore this chain no longer works: dump all users, then dump all sessions of those users. I think the user dump code could build the user list based on the sessions, but we need to collect these fields: IP address, VRF id, number of static and dynamic sessions. For a big number of sessions it might be time-consuming before the first user could be sent. Probably, maintaining a user list would be cheaper. How do you think vl_api_nat44_user_dump_t can be fixed for NAT ED? -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16327): https://lists.fd.io/g/vpp-dev/message/16327 Mute This Topic: https://lists.fd.io/mt/74156168/21656 Mute #nat: https://lists.fd.io/mk?hashtag=nat&subid=1480452 Mute #nat44: https://lists.fd.io/mk?hashtag=nat44&subid=1480452 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] Events for IP address addition/deletion on an interface #vpp
Hi Ole, > > Where is the IP address configuration coming from? If it's your > application that configures the addresses, shouldn't the control plane > application know that itself? There are several independent instances of the application. If one of them configures the addresses, others should know about it. Regards, Alexander -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#13318): https://lists.fd.io/g/vpp-dev/message/13318 Mute This Topic: https://lists.fd.io/mt/32105640/21656 Mute #vpp: https://lists.fd.io/mk?hashtag=vpp&subid=1480452 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
[vpp-dev] Events for IP address addition/deletion on an interface #vpp
Hello, I have an application that is a client to the shared memory API and I would like to know when an IP address has been added or deleted on an interface. I see that there is sw_interface_event that can notify a client about interface admin/link status changes as well as interface deletion. If I extend sw_interface_event and add ipv4_address/ipv6_address flags indicating the corresponding change, would upstream accept this? Regards, Alexander -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#13316): https://lists.fd.io/g/vpp-dev/message/13316 Mute This Topic: https://lists.fd.io/mt/32105640/21656 Mute #vpp: https://lists.fd.io/mk?hashtag=vpp&subid=1480452 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-