Re: [vpp-dev] "unix-cli-local" node corruption in node_by_name hash #vpp #vpp-dev
Yes Dave, we are initiating a loopback delete from our application. But I don't find anything suspicious in this code path, my guess is may be the hash was already corrupted. Regarding my question of thread_barrier lock, I was refering to the current patch in unix_cli_file_add() where we are modifying the hash. Normally for node rename we call vlib_node_rename() where it is always called from main_thread and takes thread barrier lock. -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#19615): https://lists.fd.io/g/vpp-dev/message/19615 Mute This Topic: https://lists.fd.io/mt/83471274/21656 Mute #vpp:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp Mute #vpp-dev:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-dev Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] "unix-cli-local" node corruption in node_by_name hash #vpp #vpp-dev
Hi Dave, This hash node key corruption I observed while debugging a VPP crash due to node_by_name hash access which seems to be corrupted. So I thought the unix-cli-local node might be the root cause, but after the patch also we saw the crash again. The bt looks like below. * Frame 00: /lib64/libpthread.so.0(+0x14a90) [0x7ff07d7dea90] * Frame 01: /lib64/libvppinfra.so.20.05.1(hash_memory+0x30) [0x7ff07d9548a0] * Frame 02: /lib64/libvppinfra.so.20.05.1(+0x23ce0) [0x7ff07d954ce0] * Frame 03: /lib64/libvppinfra.so.20.05.1(_hash_set3+0xfd) [0x7ff07d955dfd] * Frame 04: /lib64/libvppinfra.so.20.05.1(+0x24c71) [0x7ff07d955c71] * Frame 05: /lib64/libvlib.so.20.05.1(vlib_node_rename+0xa9) [0x7ff07daf2b49] * Frame 06: /lib64/libvnet.so.20.05.1(vnet_delete_hw_interface+0x4eb) [0x7ff07eae3f9b] * Frame 07: /lib64/libvnet.so.20.05.1(ethernet_delete_interface+0x713) [0x7ff07eb14673] * Frame 08: /lib64/libvnet.so.20.05.1(vnet_delete_loopback_interface+0x119) [0x7ff07eb16169] * Frame 09: /lib64/libvnet.so.20.05.1(+0xf816b4) [0x7ff07eaf06b4] * Frame 10: /lib64/libvlibmemory.so.20.05.1(vl_msg_api_socket_handler+0x11c) [0x7ff07f3a101c] * Frame 11: /lib64/libvlibmemory.so.20.05.1(vl_socket_process_api_msg+0x18) [0x7ff07f38e9b8] I have one question regarding the changes, don't we need to take thread_barrier lock before updating the hash node_by_name ? Regards, Sontu -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#19610): https://lists.fd.io/g/vpp-dev/message/19610 Mute This Topic: https://lists.fd.io/mt/83471274/21656 Mute #vpp:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp Mute #vpp-dev:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-dev Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] "unix-cli-local" node corruption in node_by_name hash #vpp #vpp-dev
Thanks for the patch Dave. With this I am not seeing the corruption issue of node hash key in node_by_name hash. Regards, Sontu On Sat, 12 Jun, 2021, 11:09 AM sontu mazumdar, wrote: > Thanks Dave. > I will try with your suggested code changes and will share the result. > > Regards, > Sontu > > On Fri, 11 Jun, 2021, 9:34 PM Dave Barach, wrote: > >> Please try these diffs and report results. >> >> >> >> diff --git a/src/vlib/unix/cli.c b/src/vlib/unix/cli.c >> >> index 6c9886725..ce29e0723 100644 >> >> --- a/src/vlib/unix/cli.c >> >> +++ b/src/vlib/unix/cli.c >> >> @@ -2863,6 +2863,7 @@ unix_cli_file_add (unix_cli_main_t * cm, char >> *name, int fd) >> >> { >> >>unix_main_t *um = _main; >> >>clib_file_main_t *fm = _main; >> >> + vlib_node_main_t *nm = _get_main()->node_main; >> >>unix_cli_file_t *cf; >> >>clib_file_t template = { 0 }; >> >>vlib_main_t *vm = um->vlib_main; >> >> @@ -2896,10 +2897,12 @@ unix_cli_file_add (unix_cli_main_t * cm, char >> *name, int fd) >> >>old_name = n->name; >> >>n->name = (u8 *) name; >> >> } >> >> + ASSERT(old_name); >> >> + hash_unset (nm->node_by_name, old_name); >> >> + hash_set (nm->node_by_name, name, n->index); >> >>vec_free (old_name); >> >>vlib_node_set_state (vm, n->index, VLIB_NODE_STATE_POLLING); >> >> - >> >>_vec_len (cm->unused_cli_process_node_indices) = l - 1; >> >> } >> >>else >> >> >> >> *From:* vpp-dev@lists.fd.io *On Behalf Of *sontu >> mazumdar >> *Sent:* Friday, June 11, 2021 11:34 AM >> *To:* vpp-dev@lists.fd.io >> *Subject:* [vpp-dev] "unix-cli-local" node corruption in node_by_name >> hash #vpp #vpp-dev >> >> >> >> Hi, >> >> >> >> I observe that in node_by_name hash we store a node with name >> "unix-cli-local:0" and node index 720 (not sure the purpose of the node). >> >> The node name is stored as key in the node_by_name hash. >> >> But later at some time when I print the node_by_name hash's each entry I >> see the key of node i.e the node name is printing some junk value (I >> figured it out via checking it against the node index). >> >> >> >> When I looked at code in unix_cli_file_add(), below we are first time >> adding the node with name "unix-cli-local:0". >> >> >> >> static vlib_node_registration_t r = { >> >> .function = unix_cli_process, >> >> .type = VLIB_NODE_TYPE_PROCESS, >> >> .process_log2_n_stack_bytes = 18, >> >> }; >> >> >> >> r.name = name; >> >> >> >> vlib_worker_thread_barrier_sync (vm); >> >> >> >> vlib_register_node (vm, ); <<<<<<<<<<<< >> >> vec_free (name); >> >> >> >> n = vlib_get_node (vm, r.index); >> >> vlib_worker_thread_node_runtime_update (); >> >> vlib_worker_thread_barrier_release (vm); >> >> >> >> >> >> Later it again calls unix_cli_file_add(), there we pass a different name >> "unix-cli-local:1". >> >> In this case we are overwriting the already existing node name from >> "unix-cli-local:0" to "unix-cli-local:1". >> >> >> >> for (i = 0; i < vec_len (vlib_mains); i++) >> >> { >> >> this_vlib_main = vlib_mains[i]; >> >> if (this_vlib_main == 0) >> >> continue; >> >> n = vlib_get_node (this_vlib_main, >> >> cm->unused_cli_process_node_indices[l - 1]); >> >> old_name = n->name; <<<<<<<<<<< >> >> n->name = (u8 *) name; <<<<<<<<<<< >> >> } >> >> vec_free (old_name); <<<<<<<<<< >> >> >> >> But the node name is already present in node_by_name hash as a key and >> there we haven't updated it instead we have deleted the old name. >> >> This is resulting in printing some corrupted node name for the above node >> in node_by_name hash, which I think can sometimes results in VPP crash also >> as the hash key points to some freed memory. >> >> >> >> Regards, >> >> Sontu >> >> >> >> -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#19563): https://lists.fd.io/g/vpp-dev/message/19563 Mute This Topic: https://lists.fd.io/mt/83471274/21656 Mute #vpp:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp Mute #vpp-dev:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-dev Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] "unix-cli-local" node corruption in node_by_name hash #vpp #vpp-dev
Thanks Dave. I will try with your suggested code changes and will share the result. Regards, Sontu On Fri, 11 Jun, 2021, 9:34 PM Dave Barach, wrote: > Please try these diffs and report results. > > > > diff --git a/src/vlib/unix/cli.c b/src/vlib/unix/cli.c > > index 6c9886725..ce29e0723 100644 > > --- a/src/vlib/unix/cli.c > > +++ b/src/vlib/unix/cli.c > > @@ -2863,6 +2863,7 @@ unix_cli_file_add (unix_cli_main_t * cm, char *name, > int fd) > > { > >unix_main_t *um = _main; > >clib_file_main_t *fm = _main; > > + vlib_node_main_t *nm = _get_main()->node_main; > >unix_cli_file_t *cf; > >clib_file_t template = { 0 }; > >vlib_main_t *vm = um->vlib_main; > > @@ -2896,10 +2897,12 @@ unix_cli_file_add (unix_cli_main_t * cm, char > *name, int fd) > >old_name = n->name; > >n->name = (u8 *) name; > > } > > + ASSERT(old_name); > > + hash_unset (nm->node_by_name, old_name); > > + hash_set (nm->node_by_name, name, n->index); > >vec_free (old_name); > >vlib_node_set_state (vm, n->index, VLIB_NODE_STATE_POLLING); > > - > >_vec_len (cm->unused_cli_process_node_indices) = l - 1; > > } > >else > > > > *From:* vpp-dev@lists.fd.io *On Behalf Of *sontu > mazumdar > *Sent:* Friday, June 11, 2021 11:34 AM > *To:* vpp-dev@lists.fd.io > *Subject:* [vpp-dev] "unix-cli-local" node corruption in node_by_name > hash #vpp #vpp-dev > > > > Hi, > > > > I observe that in node_by_name hash we store a node with name > "unix-cli-local:0" and node index 720 (not sure the purpose of the node). > > The node name is stored as key in the node_by_name hash. > > But later at some time when I print the node_by_name hash's each entry I > see the key of node i.e the node name is printing some junk value (I > figured it out via checking it against the node index). > > > > When I looked at code in unix_cli_file_add(), below we are first time > adding the node with name "unix-cli-local:0". > > > > static vlib_node_registration_t r = { > > .function = unix_cli_process, > > .type = VLIB_NODE_TYPE_PROCESS, > > .process_log2_n_stack_bytes = 18, > > }; > > > > r.name = name; > > > > vlib_worker_thread_barrier_sync (vm); > > > > vlib_register_node (vm, ); <<<<<<<<<<<< > > vec_free (name); > > > > n = vlib_get_node (vm, r.index); > > vlib_worker_thread_node_runtime_update (); > > vlib_worker_thread_barrier_release (vm); > > > > > > Later it again calls unix_cli_file_add(), there we pass a different name > "unix-cli-local:1". > > In this case we are overwriting the already existing node name from > "unix-cli-local:0" to "unix-cli-local:1". > > > > for (i = 0; i < vec_len (vlib_mains); i++) > > { > > this_vlib_main = vlib_mains[i]; > > if (this_vlib_main == 0) > > continue; > > n = vlib_get_node (this_vlib_main, > > cm->unused_cli_process_node_indices[l - 1]); > > old_name = n->name; <<<<<<<<<<< > > n->name = (u8 *) name; <<<<<<<<<<< > > } > > vec_free (old_name); <<<<<<<<<< > > > > But the node name is already present in node_by_name hash as a key and > there we haven't updated it instead we have deleted the old name. > > This is resulting in printing some corrupted node name for the above node > in node_by_name hash, which I think can sometimes results in VPP crash also > as the hash key points to some freed memory. > > > > Regards, > > Sontu > > > > -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#19562): https://lists.fd.io/g/vpp-dev/message/19562 Mute This Topic: https://lists.fd.io/mt/83471274/21656 Mute #vpp:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp Mute #vpp-dev:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-dev Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
[vpp-dev] "unix-cli-local" node corruption in node_by_name hash #vpp #vpp-dev
Hi, I observe that in node_by_name hash we store a node with name "unix-cli-local:0" and node index 720 (not sure the purpose of the node). The node name is stored as key in the node_by_name hash. But later at some time when I print the node_by_name hash's each entry I see the key of node i.e the node name is printing some junk value (I figured it out via checking it against the node index). When I looked at code in unix_cli_file_add(), below we are first time adding the node with name "unix-cli-local:0". static vlib_node_registration_t r = { .function = unix_cli_process, .type = VLIB_NODE_TYPE_PROCESS, .process_log2_n_stack_bytes = 18, }; r.name = name; vlib_worker_thread_barrier_sync (vm); vlib_register_node (vm, ); vec_free (name); n = vlib_get_node (vm, r.index); vlib_worker_thread_node_runtime_update (); vlib_worker_thread_barrier_release (vm); Later it again calls unix_cli_file_add(), there we pass a different name "unix-cli-local:1". In this case we are overwriting the already existing node name from "unix-cli-local:0" to "unix-cli-local:1". for (i = 0; i < vec_len (vlib_mains); i++) { this_vlib_main = vlib_mains[i]; if (this_vlib_main == 0) continue; n = vlib_get_node (this_vlib_main, cm->unused_cli_process_node_indices[l - 1]); old_name = n->name; <<< n->name = (u8 *) name; <<< } vec_free (old_name); << But the node name is already present in node_by_name hash as a key and there we haven't updated it instead we have deleted the old name. This is resulting in printing some corrupted node name for the above node in node_by_name hash, which I think can sometimes results in VPP crash also as the hash key points to some freed memory. Regards, Sontu -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#19560): https://lists.fd.io/g/vpp-dev/message/19560 Mute This Topic: https://lists.fd.io/mt/83471274/21656 Mute #vpp:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp Mute #vpp-dev:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-dev Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
[vpp-dev] node_by_name hash table getting corrupted #vpp-dev
Hi, We are very inconsistently seeing a VPP crash while try to access node_by_name hash, it looks like the node_by_name is getting corrupted. The bt looks like below * Frame 01: /lib64/libvppinfra.so.20.05.1(hash_memory+0x34) [0x7f1dae34c8a4] * Frame 02: /lib64/libvppinfra.so.20.05.1(+0x23ce0) [0x7f1dae34cce0] * Frame 03: /lib64/libvppinfra.so.20.05.1(_hash_set3+0xfd) [0x7f1dae34ddfd] * Frame 04: /lib64/libvppinfra.so.20.05.1(+0x24c71) [0x7f1dae34dc71] * Frame 05: /lib64/libvlib.so.20.05.1(vlib_node_rename+0xa9) [0x7f1dae4eab49] * Frame 06: /lib64/libvnet.so.20.05.1(vnet_delete_hw_interface+0x4eb) [0x7f1daf4d7b9b] * Frame 07: /lib64/libvnet.so.20.05.1(ethernet_delete_interface+0x713) [0x7f1daf508273] * Frame 08: /lib64/libvnet.so.20.05.1(vnet_delete_loopback_interface+0x119) [0x7f1daf509d69] * Frame 09: /lib64/libvnet.so.20.05.1(+0xf7d2b4) [0x7f1daf4e42b4] * Frame 10: /lib64/libvlibmemory.so.20.05.1(vl_msg_api_socket_handler+0x11c) [0x7f1dafd9401c] * Frame 11: /lib64/libvlibmemory.so.20.05.1(vl_socket_process_api_msg+0x18) [0x7f1dafd819b8] * Frame 12: /lib64/libvlibmemory.so.20.05.1(+0x14241) [0x7f1dafd86241] * Frame 13: /lib64/libvlib.so.20.05.1(+0x111c27) [0x7f1dae4e8c27] * Frame 14: /lib64/libvppinfra.so.20.05.1(+0x296f4) [0x7f1dae3526f4] >From our custom plugin codes we are not trying to modify this hash, all we use >is "vlib_get_node_by_name()" function. All node registration will happen during initialization. Only loopback interface can be added/deleted dynamically, for which we use standard VAPI functions which internally calls register_node() and in case of loopback delete it calls vlib_node_rename(). Can you someone give any clue when this "node_by_name" hash may get corrupted ? Regards, Sontu -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#19498): https://lists.fd.io/g/vpp-dev/message/19498 Mute This Topic: https://lists.fd.io/mt/83147253/21656 Mute #vpp-dev:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-dev Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
[vpp-dev] VPP crash observed in hash_memory64() while creating loopback interface #vpp #vpp-dev
Hi, I am using FDIO 20.05 version. Here we are trying to configure a loopback interface via VAPI. But in our testing we see that VPP is crashing, the crash is very hard to reproduce and seen only 2-3 times till now. Below is the bt #0 0x7f09134041a2 in hash_memory64 (state=, n_bytes=, p=) at /usr/src/debug/vpp-20.05.1-2~gcb5420544_dirty.x86_64/src/vppinfra/hash.c:276 #1 hash_memory (p=0x7f08e8c26ef0, n_bytes=8, state=0) at /usr/src/debug/vpp-20.05.1-2~gcb5420544_dirty.x86_64/src/vppinfra/hash.c:280 #2 0x7f09134051e5 in key_sum (key=139676241522416, h=0x7f08e93921f0) at /usr/src/debug/vpp-20.05.1-2~gcb5420544_dirty.x86_64/src/vppinfra/hash.c:341 #3 lookup (v=v@entry=0x7f08e9392330, key=key@entry=139676241522416, op=op@entry=SET, new_value=0x7f08d372e618, old_value=0x0) at /usr/src/debug/vpp-20.05.1-2~gcb5420544_dirty.x86_64/src/vppinfra/hash.c:556 #4 0x7f091340599e in _hash_set3 (v=0x7f08e9392330, key=139676241522416, value=, old_value=old_value@entry=0x0) at /usr/src/debug/vpp-20.05.1-2~gcb5420544_dirty.x86_64/src/vppinfra/hash.c:848 #5 0x7f0913405b8c in hash_resize_internal (old=old@entry=0x7f08d372e610, new_size=, free_old=free_old@entry=1) at /usr/src/debug/vpp-20.05.1-2~gcb5420544_dirty.x86_64/src/vppinfra/hash.c:816 #6 0x7f0913405c3a in hash_resize (old=old@entry=0x7f08d372e610, new_size=) at /usr/src/debug/vpp-20.05.1-2~gcb5420544_dirty.x86_64/src/vppinfra/hash.c:830 #7 0x7f09134059c8 in _hash_set3 (v=0x7f08d372e610, key=139675881216528, value=value@entry=0x7f08d36ddb60, old_value=old_value@entry=0x0) at /usr/src/debug/vpp-20.05.1-2~gcb5420544_dirty.x86_64/src/vppinfra/hash.c:853 #8 0x7f0913d22baa in register_node () at /usr/src/debug/vpp-20.05.1-2~gcb5420544_dirty.x86_64/src/vlib/node.c:382 #9 0x7f0913d248b9 in vlib_register_node (vm=vm@entry=0x7f0913f79300 , r=r@entry=0x7f08d36ddc60) at /usr/src/debug/vpp-20.05.1-2~gcb5420544_dirty.x86_64/src/vlib/node.c:530 #10 0x7f0914ae6107 in vnet_register_interface (vnm=vnm@entry=0x7f09153ad180 , dev_class_index=30, dev_instance=dev_instance@entry=6, hw_class_index=, hw_instance=9) at /usr/src/debug/vpp-20.05.1-2~gcb5420544_dirty.x86_64/src/vnet/interface.c:911 #11 0x7f0914b1580b in ethernet_register_interface (vnm=vnm@entry=0x7f09153ad180 , dev_class_index=, dev_instance=dev_instance@entry=6, address=address@entry=0x7f08d36dddea "ޭ", hw_if_index_return=hw_if_index_return@entry=0x7f08d36ddde4, flag_change=flag_change@entry=0x0) at /usr/src/debug/vpp-20.05.1-2~gcb5420544_dirty.x86_64/src/vnet/ethernet/interface.c:347 #12 0x7f0914b1690b in vnet_create_loopback_interface (sw_if_indexp=sw_if_indexp@entry=0x7f08d36dde3c, mac_address=mac_address@entry=0x7f08d36dde42 "", is_specified=is_specified@entry=0 '\000', user_instance=user_instance@entry=0) at /usr/src/debug/vpp-20.05.1-2~gcb5420544_dirty.x86_64/src/vnet/ethernet/interface.c:859 #13 0x7f0914aedaef in vl_api_create_loopback_t_handler (mp=0x7f08e9795bd0) at /usr/src/debug/vpp-20.05.1-2~gcb5420544_dirty.x86_64/src/vnet/interface_api.c:1365 #14 0x7f09153f6538 in msg_handler_internal (free_it=0, do_it=1, trace_it=, the_msg=0x7f08e9795bd0, am=0x7f0915602ea0 ) at /usr/src/debug/vpp-20.05.1-2~gcb5420544_dirty.x86_64/src/vlibapi/api_shared.c:488 #15 vl_msg_api_handler_no_free (the_msg=0x7f08e9795bd0) at /usr/src/debug/vpp-20.05.1-2~gcb5420544_dirty.x86_64/src/vlibapi/api_shared.c:675 #16 0x7f09153e8755 in vl_socket_process_api_msg (rp=, input_v=) at /usr/src/debug/vpp-20.05.1-2~gcb5420544_dirty.x86_64/src/vlibmemory/socket_api.c:199 In hash_memory64() function it is crashing in below highlighted line 112 static inline u64 113 hash_memory64 (void *p, word n_bytes, u64 state) 114 { . . 133 a = b = 0x9e3779b97f4a7c13LL; 134 c = state; 135 n = n_bytes; 136 137 while (n >= 3 * sizeof (u64)) 138 { 139 a += clib_mem_unaligned (q + 0, u64); 140 b += clib_mem_unaligned (q + 1, u64); *141 c += clib_mem_unaligned (q + 2, u64); < crash is seen in this line* 142 hash_mix64 (a, b, c); 143 n -= 3 * sizeof (u64); 144 q += 3; 145 } This happens during initialization when the VPP is coming up freshly, currently apart from the bt we don't have any information. Can someone please point out what could go wrong by looking at the bt ? Any clue will help us to move to correct direction to debug this crash. Regards, Sontu -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#18889): https://lists.fd.io/g/vpp-dev/message/18889 Mute This Topic: https://lists.fd.io/mt/81198486/21656 Mute #vpp:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp Mute #vpp-dev:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-dev Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
[vpp-dev] VPP crash @fib_entry_delegate_get during ipv6 address delete #vpp
Hi, I am seeing VPP crash during ipv6 address delete, below is the backtrace Thread 1 "vpp_main" received signal SIGSEGV, Segmentation fault. fib_entry_delegate_get (fib_entry=fib_entry@entry=0x80214a9e9af4, type=type@entry=FIB_ENTRY_DELEGATE_COVERED) at /usr/src/debug/vpp-18.10-35~g7002cae21_dirty.x86_64/src/vnet/fib/fib_entry_delegate.c:51 51 return (fib_entry_delegate_find_i(fib_entry, type, NULL)); (gdb) bt #0 fib_entry_delegate_get (fib_entry=fib_entry@entry=0x80214a9e9af4, type=type@entry=FIB_ENTRY_DELEGATE_COVERED) at /usr/src/debug/vpp-18.10-35~g7002cae21_dirty.x86_64/src/vnet/fib/fib_entry_delegate.c:51 #1 0x7fd983d57634 in fib_entry_cover_untrack (cover=cover@entry=0x80214a9e9af4, tracked_index=84) at /usr/src/debug/vpp-18.10-35~g7002cae21_dirty.x86_64/src/vnet/fib/fib_entry_cover.c:51 #2 0x7fd983d57060 in fib_entry_src_adj_deactivate (src=src@entry=0x7fd94a5cbe34, fib_entry=fib_entry@entry=0x7fd94a9ea94c) at /usr/src/debug/vpp-18.10-35~g7002cae21_dirty.x86_64/src/vnet/fib/fib_entry_src_adj.c:298 #3 0x7fd983d57131 in fib_entry_src_adj_cover_change (src=0x7fd94a5cbe34, fib_entry=0x7fd94a9ea94c) at /usr/src/debug/vpp-18.10-35~g7002cae21_dirty.x86_64/src/vnet/fib/fib_entry_src_adj.c:340 #4 0x7fd983d5365d in fib_entry_cover_changed (fib_entry_index=fib_entry_index@entry=50) at /usr/src/debug/vpp-18.10-35~g7002cae21_dirty.x86_64/src/vnet/fib/fib_entry.c:1241 #5 0x7fd983d57587 in fib_entry_cover_change_one (cover=, covered=50, args=) at /usr/src/debug/vpp-18.10-35~g7002cae21_dirty.x86_64/src/vnet/fib/fib_entry_cover.c:132 #6 0x7fd983d57534 in fib_entry_cover_walk_node_ptr (depend=, args=) at /usr/src/debug/vpp-18.10-35~g7002cae21_dirty.x86_64/src/vnet/fib/fib_entry_cover.c:80 #7 0x7fd983d51deb in fib_node_list_walk (list=, fn=fn@entry=0x7fd983d57520 , args=args@entry=0x7fd94585bc80) at /usr/src/debug/vpp-18.10-35~g7002cae21_dirty.x86_64/src/vnet/fib/fib_node_list.c:375 #8 0x7fd983d576c0 in fib_entry_cover_walk (cover=0x7fd94a9ea82c, walk=walk@entry=0x7fd983d57540 , args=args@entry=0x) at /usr/src/debug/vpp-18.10-35~g7002cae21_dirty.x86_64/src/vnet/fib/fib_entry_cover.c:104 #9 0x7fd983d576ea in fib_entry_cover_change_notify (cover_index=cover_index@entry=46, covered=covered@entry=4294967295) at /usr/src/debug/vpp-18.10-35~g7002cae21_dirty.x86_64/src/vnet/fib/fib_entry_cover.c:158 #10 0x7fd983d49ee9 in fib_table_entry_delete_i (fib_index=, fib_entry_index=46, prefix=0x7fd94585bd00, source=FIB_SOURCE_INTERFACE) at /usr/src/debug/vpp-18.10-35~g7002cae21_dirty.x86_64/src/vnet/fib/fib_table.c:837 #11 0x7fd983d4afe4 in fib_table_entry_delete (fib_index=, prefix=, source=) at /usr/src/debug/vpp-18.10-35~g7002cae21_dirty.x86_64/src/vnet/fib/fib_table.c:872 #12 0x7fd983ac152f in ip6_del_interface_routes (fib_index=fib_index@entry=1, address_length=address_length@entry=112, address=, im=0x7fd9840d7a60 ) at /usr/src/debug/vpp-18.10-35~g7002cae21_dirty.x86_64/src/vnet/ip/ip6_forward.c:133 #13 0x7fd983ac3977 in ip6_add_del_interface_address (vm=vm@entry=0x7fd983656380 , sw_if_index=9, address=address@entry=0x7fd94a35764e, address_length=112, is_del=1) at /usr/src/debug/vpp-18.10-35~g7002cae21_dirty.x86_64/src/vnet/ip/ip6_forward.c:279 #14 0x7fd9839d87b2 in vl_api_sw_interface_add_del_address_t_handler (mp=0x7fd94a35763c) at /usr/include/bits/byteswap.h:47 Steps done to reproduce the crash: = The below ipv6 addresses are configured VPP: 2001:db8:0:1:10:164:4:34/112 Peer Router: 2001:db8:0:1:10:164:4:33/112 After configuring addresses, I did ping to peer which created ip6 neighbor and its fib_entry in VPP. vpp# show ip6 neighbor Time Address Flags Link layer Interface 404.2853 2001:db8:0:1:10:164:4:33 D f8:c0:01:18:9a:c0 VirtualFuncEthernet0/7/0.1504 vpp# show ip6 fib table 1 nc1, fib_index:1, flow hash:[src dst sport dport proto ] locks:[src:API:2, ] 2001:db8:0:1:10:164:4:33/128 <<< ipv6 neighbor fib entry unicast-ip6-chain [@0]: dpo-load-balance: [proto:ip6 index:50 buckets:1 uRPF:62 to:[2:208]] [0] [@5]: ipv6 via 2001:db8:0:1:10:164:4:33 VirtualFuncEthernet0/7/0.1504: mtu:1500 f8c001189ac0fa163ec07038810005e086dd 2001:db8:0:1:10:164:4:0/112 unicast-ip6-chain [@0]: dpo-load-balance: [proto:ip6 index:44 buckets:1 uRPF:57 to:[1:104]] [0] [@4]: ipv6-glean: VirtualFuncEthernet0/7/0.1504: mtu:9000 fa163ec07038810005e086dd 2001:db8:0:1:10:164:4:34/128 unicast-ip6-chain [@0]: dpo-load-balance: [proto:ip6 index:45 buckets:1 uRPF:58 to:[7:592]] [0] [@2]: dpo-receive: 2001:db8:0:1:10:164:4:34 on VirtualFuncEthernet0/7/0.1504 vpp# Now in our side, we changed the ipv6 address same to the Peer router one i.e 2001:db8:0:1:10:164:4:33 vpp# show ip6 fib table 1 nc1, fib_index:1, flow hash:[src dst sport dport proto ]
Re: [vpp-dev] ECMP seems to have issue if path is more than 2 #ecmp #vpp
Thanks a lot Neale, that answered my question. Regards, Sontu On Sun 29 Mar, 2020, 10:02 PM Neale Ranns (nranns), wrote: > > > Hi Sontu, > > > > Please let me refer you to a previous answer to this question: > > > https://www.mail-archive.com/search?l=vpp-dev@lists.fd.io=subject:%22%5C%5Bvpp%5C-dev%5C%5D+multipath+dpo+buckets+is+wrong.%22=newest=1 > > > > > > /neale > > > > *From: * on behalf of sontu mazumdar < > sont...@gmail.com> > *Date: *Friday 27 March 2020 at 15:47 > *To: *"vpp-dev@lists.fd.io" > *Subject: *[vpp-dev] ECMP seems to have issue if path is more than 2 > #ecmp #vpp > > > > Hi, > > I am using fdio 1810 version. > I observed that why I try to configure route with more than 2 paths, in > show ip fib output it displays many duplicates entries. > This is what I am trying, I have 3 interfaces as below > > vpp# show interface address > > VirtualFunctionEthernet0/6/0 (up): > > VirtualFunctionEthernet0/6/0.1 (up): > > L3 10.10.10.1/24 ip4 table-id 1 fib-idx 1 > > VirtualFunctionEthernet0/6/0.2 (up): > > L3 20.20.20.1/24 ip4 table-id 1 fib-idx 1 > > VirtualFunctionEthernet0/6/0.3 (up): > > L3 30.30.30.1/24 ip4 table-id 1 fib-idx 1 > > I am adding route as below: > > 1st path: > > *vpp# ip route add 2.2.2.2/32 <http://2.2.2.2/32> table 1 via 10.10.10.2 > VirtualFunctionEthernet0/6/0.1* > > vpp# show ip fib table 1 2.2.2.2/32 > > nc1, fib_index:1, flow hash:[src dst sport dport proto ] locks:[src:API:4, > ] > > 2.2.2.2/32 fib:1 index:44 locks:2 > > src:CLI refs:1 src-flags:added,contributing,active, > > path-list:[51] locks:2 flags:shared, uPRF-list:53 len:1 itfs:[5, ] > > path:[59] pl-index:51 ip4 weight=1 pref=0 attached-nexthop: > oper-flags:resolved, > > 10.10.10.2 VirtualFunctionEthernet0/6/0.1 > > [@0]: arp-ipv4: via 10.10.10.2 VirtualFunctionEthernet0/6/0.1 > > > > forwarding: unicast-ip4-chain > > [@0]: dpo-load-balance: [proto:ip4 index:46 buckets:1 uRPF:53 to:[0:0]] > > [0] [@3]: arp-ipv4: via 10.10.10.2 VirtualFunctionEthernet0/6/0.1 > > vpp# > > > 2nd path: > > *vpp# ip route add 2.2.2.2/32 <http://2.2.2.2/32> table 1 via 20.20.20.2 > VirtualFunctionEthernet0/6/0.2* > > vpp# show ip fib table 1 2.2.2.2/32 > > nc1, fib_index:1, flow hash:[src dst sport dport proto ] locks:[src:API:4, > ] > > 2.2.2.2/32 fib:1 index:44 locks:2 > > src:CLI refs:1 src-flags:added,contributing,active, > > path-list:[53] locks:2 flags:shared, uPRF-list:55 len:2 itfs:[5, 6, ] > > path:[62] pl-index:53 ip4 weight=1 pref=0 attached-nexthop: > oper-flags:resolved, > > 10.10.10.2 VirtualFunctionEthernet0/6/0.1 > > [@0]: arp-ipv4: via 10.10.10.2 VirtualFunctionEthernet0/6/0.1 > > path:[61] pl-index:53 ip4 weight=1 pref=0 attached-nexthop: > oper-flags:resolved, > > 20.20.20.2 VirtualFunctionEthernet0/6/0.2 > > [@0]: arp-ipv4: via 20.20.20.2 VirtualFunctionEthernet0/6/0.2 > > > > forwarding: unicast-ip4-chain > > [@0]: dpo-load-balance: [proto:ip4 index:46 buckets:2 uRPF:55 to:[0:0]] > > [0] [@3]: arp-ipv4: via 10.10.10.2 VirtualFunctionEthernet0/6/0.1 > > [1] [@3]: arp-ipv4: via 20.20.20.2 VirtualFunctionEthernet0/6/0.2 > > vpp# > > > 3rd path: > > *vpp# ip route add 2.2.2.2/32 <http://2.2.2.2/32> table 1 via 30.30.30.2 > VirtualFunctionEthernet0/6/0.3* > > > > vpp# show ip fib table 1 2.2.2.2/32 > > nc1, fib_index:1, flow hash:[src dst sport dport proto ] locks:[src:API:4, > ] > > 2.2.2.2/32 fib:1 index:44 locks:2 > > src:CLI refs:1 src-flags:added,contributing,active, > > path-list:[51] locks:2 flags:shared, uPRF-list:53 len:3 itfs:[5, 6, 7, > ] > > path:[63] pl-index:51 ip4 weight=1 pref=0 attached-nexthop: > oper-flags:resolved, > > 10.10.10.2 VirtualFunctionEthernet0/6/0.1 > > [@0]: arp-ipv4: via 10.10.10.2 VirtualFunctionEthernet0/6/0.1 > > path:[64] pl-index:51 ip4 weight=1 pref=0 attached-nexthop: > oper-flags:resolved, > > 20.20.20.2 VirtualFunctionEthernet0/6/0.2 > > [@0]: arp-ipv4: via 20.20.20.2 VirtualFunctionEthernet0/6/0.2 > > path:[59] pl-index:51 ip4 weight=1 pref=0 attached-nexthop: > oper-flags:resolved, > > 30.30.30.2 VirtualFunctionEthernet0/6/0.3 > > [@0]: arp-ipv4: via 30.30.30.2 VirtualFunctionEthernet0/6/0.3 > > > > forwarding: unicast-ip4-chain > > [@0]: dpo-load-balance: [proto:ip4 index:46 buckets:16 uRPF:53 to:[0:0]] > > [0] [@3]: arp-ipv4: via 10.10.10.2
[vpp-dev] ECMP seems to have issue if path is more than 2 #ecmp #vpp
Hi, I am using fdio 1810 version. I observed that why I try to configure route with more than 2 paths, in show ip fib output it displays many duplicates entries. This is what I am trying, I have 3 interfaces as below vpp# show interface address VirtualFunctionEthernet0/6/0 (up): VirtualFunctionEthernet0/6/0.1 (up): L3 10.10.10.1/24 ip4 table-id 1 fib-idx 1 VirtualFunctionEthernet0/6/0.2 (up): L3 20.20.20.1/24 ip4 table-id 1 fib-idx 1 VirtualFunctionEthernet0/6/0.3 (up): L3 30.30.30.1/24 ip4 table-id 1 fib-idx 1 I am adding route as below: 1st path: *vpp# ip route add 2.2.2.2/32 table 1 via 10.10.10.2 VirtualFunctionEthernet0/6/0.1 * vpp# show ip fib table 1 2.2.2.2/32 nc1, fib_index:1, flow hash:[src dst sport dport proto ] locks:[src:API:4, ] 2.2.2.2/32 fib:1 index:44 locks:2 src:CLI refs:1 src-flags:added,contributing,active, path-list:[51] locks:2 flags:shared, uPRF-list:53 len:1 itfs:[5, ] path:[59] pl-index:51 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved, 10.10.10.2 VirtualFunctionEthernet0/6/0.1 [@0]: arp-ipv4: via 10.10.10.2 VirtualFunctionEthernet0/6/0.1 forwarding: unicast-ip4-chain [@0]: dpo-load-balance: [proto:ip4 index:46 buckets:1 uRPF:53 to:[0:0]] [0] [@3]: arp-ipv4: via 10.10.10.2 VirtualFunctionEthernet0/6/0.1 vpp# 2nd path: *vpp# ip route add 2.2.2.2/32 table 1 via 20.20.20.2 VirtualFunctionEthernet0/6/0.2 * vpp# show ip fib table 1 2.2.2.2/32 nc1, fib_index:1, flow hash:[src dst sport dport proto ] locks:[src:API:4, ] 2.2.2.2/32 fib:1 index:44 locks:2 src:CLI refs:1 src-flags:added,contributing,active, path-list:[53] locks:2 flags:shared, uPRF-list:55 len:2 itfs:[5, 6, ] path:[62] pl-index:53 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved, 10.10.10.2 VirtualFunctionEthernet0/6/0.1 [@0]: arp-ipv4: via 10.10.10.2 VirtualFunctionEthernet0/6/0.1 path:[61] pl-index:53 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved, 20.20.20.2 VirtualFunctionEthernet0/6/0.2 [@0]: arp-ipv4: via 20.20.20.2 VirtualFunctionEthernet0/6/0.2 forwarding: unicast-ip4-chain [@0]: dpo-load-balance: [proto:ip4 index:46 buckets:2 uRPF:55 to:[0:0]] [0] [@3]: arp-ipv4: via 10.10.10.2 VirtualFunctionEthernet0/6/0.1 [1] [@3]: arp-ipv4: via 20.20.20.2 VirtualFunctionEthernet0/6/0.2 vpp# 3rd path: *vpp# ip route add 2.2.2.2/32 table 1 via 30.30.30.2 VirtualFunctionEthernet0/6/0.3* vpp# show ip fib table 1 2.2.2.2/32 nc1, fib_index:1, flow hash:[src dst sport dport proto ] locks:[src:API:4, ] 2.2.2.2/32 fib:1 index:44 locks:2 src:CLI refs:1 src-flags:added,contributing,active, path-list:[51] locks:2 flags:shared, uPRF-list:53 len:3 itfs:[5, 6, 7, ] path:[63] pl-index:51 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved, 10.10.10.2 VirtualFunctionEthernet0/6/0.1 [@0]: arp-ipv4: via 10.10.10.2 VirtualFunctionEthernet0/6/0.1 path:[64] pl-index:51 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved, 20.20.20.2 VirtualFunctionEthernet0/6/0.2 [@0]: arp-ipv4: via 20.20.20.2 VirtualFunctionEthernet0/6/0.2 path:[59] pl-index:51 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved, 30.30.30.2 VirtualFunctionEthernet0/6/0.3 [@0]: arp-ipv4: via 30.30.30.2 VirtualFunctionEthernet0/6/0.3 forwarding: unicast-ip4-chain [@0]: dpo-load-balance: [proto:ip4 index:46 buckets:16 uRPF:53 to:[0:0]] [0] [@3]: arp-ipv4: via 10.10.10.2 VirtualFunctionEthernet0/6/0.1 [1] [@3]: arp-ipv4: via 10.10.10.2 VirtualFunctionEthernet0/6/0.1 [2] [@3]: arp-ipv4: via 10.10.10.2 VirtualFunctionEthernet0/6/0.1 [3] [@3]: arp-ipv4: via 10.10.10.2 VirtualFunctionEthernet0/6/0.1 [4] [@3]: arp-ipv4: via 10.10.10.2 VirtualFunctionEthernet0/6/0.1 [5] [@3]: arp-ipv4: via 10.10.10.2 VirtualFunctionEthernet0/6/0.1 [6] [@3]: arp-ipv4: via 20.20.20.2 VirtualFunctionEthernet0/6/0.2 [7] [@3]: arp-ipv4: via 20.20.20.2 VirtualFunctionEthernet0/6/0.2 [8] [@3]: arp-ipv4: via 20.20.20.2 VirtualFunctionEthernet0/6/0.2 [9] [@3]: arp-ipv4: via 20.20.20.2 VirtualFunctionEthernet0/6/0.2 [10] [@3]: arp-ipv4: via 20.20.20.2 VirtualFunctionEthernet0/6/0.2 [11] [@3]: arp-ipv4: via 30.30.30.2 VirtualFunctionEthernet0/6/0.3 [12] [@3]: arp-ipv4: via 30.30.30.2 VirtualFunctionEthernet0/6/0.3 [13] [@3]: arp-ipv4: via 30.30.30.2 VirtualFunctionEthernet0/6/0.3 [14] [@3]: arp-ipv4: via 30.30.30.2 VirtualFunctionEthernet0/6/0.3 [15] [@3]: arp-ipv4: via 30.30.30.2 VirtualFunctionEthernet0/6/0.3 vpp# Once I add the 3rd path, as you see above I see multiple duplicate entries of the next-hops. Is it a bug ? Is it expected to have such output ? Can someone please help on this. Regards, Sontu -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#15900): https://lists.fd.io/g/vpp-dev/message/15900 Mute This Topic: https://lists.fd.io/mt/72588783/21656 Mute #vpp: https://lists.fd.io/mk?hashtag=vpp=1480452 Mute #ecmp: https://lists.fd.io/mk?hashtag=ecmp=1480452 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
Re: [vpp-dev] BFD sends old remote discriminator in its control packet after session goes to DOWN state #vpp
Hi Klement, Thanks for the patch file. The fix works. Now I could see that once BFD session goes DOWN due to inactivity timer, remote discriminator is set 0 in BFD control packet. On Fri, Jan 17, 2020 at 3:37 PM Klement Sekera -X (ksekera - PANTHEON TECH SRO at Cisco) wrote: > Hi, > > thank you for your report. > > Can you please apply this fix and verify that the behaviour is now correct? > > https://gerrit.fd.io/r/c/vpp/+/24388 > > Thanks, > Klement > > > On 17 Jan 2020, at 07:04, sont...@gmail.com wrote: > > > > Hi, > > > > I have observed an incorrect behavior in BFD code of VPP. > > I have brought BFD session UP between VPP and peer router. > > Due to interface shutdown on peer router BFD session on VPP goes to DOWN > state. > > Once it goes to DOWN state, it is continuously sending control packets > using its old remote discriminator in its control packet's "Your > Discriminator" field. > > This is a wrong behavior. Below RFC section tells that once BFD goes > DOWN due to non-receipt of BFD control packet, "Your Discriminator" field > should be set to zero. > > > > RFC 5880 6.8.1. State Variables > > > > > >bfd.RemoteDiscr > > > > The remote discriminator for this BFD session. This is the > > discriminator chosen by the remote system, and is totally opaque > > to the local system. This MUST be initialized to zero. If a > > period of a Detection Time passes without the receipt of a valid, > > authenticated BFD packet from the remote system, this variable > > MUST be set to zero. > > > > > > -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#15201): https://lists.fd.io/g/vpp-dev/message/15201 Mute This Topic: https://lists.fd.io/mt/69820780/21656 Mute #vpp: https://lists.fd.io/mk?hashtag=vpp=1480452 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-