Re: [vpp-dev] det44 and map plugins interfere with linux-cp
On 19/11/2021 13:32, Ben McKeegan via lists.fd.io wrote: Firstly, with the map plugin it appears to break IPv6 connectivity: the control plane can no longer successfully do NDP to the external gateway (a layer 3 switch). NDP replies from the gateway to the control plane do not arrive. There is a very simple workaround: if I put in a static neighbour entry in Linux (with 'ip neigh replace ...') everything else works. I have not yet understood why this happens although as I have a workaround I did not spent too long on investigating it. It turns out this was fairly straightforward, see patch below which fixed it for me. Previously, the code checked for ICMP6 echo request and reply codes and handled these locally, attempting to relay everything else (and discarding any that are not suitable for relaying). For now I have added similar exceptions for NDP and RAs, but this seems a little backward to me. Should we make IP6_MAP_NEXT_IP6_LOCAL the default and only set IP6_MAP_NEXT_IP6_ICMP_RELAY for one of the four ICMP6 error codes that ip6_map_icmp_relay() actually checks for? The comment in the code says: * ICMP IPv6 packet * - Error -> Pass to ICMPv6/ICMPv4 relay * - Info -> Pass to IPv6 local ... which makes sense, but doesn't match what the code was doing. diff --git a/src/plugins/map/ip6_map.c b/src/plugins/map/ip6_map.c index 1193dda0a..d400c634c 100644 --- a/src/plugins/map/ip6_map.c +++ b/src/plugins/map/ip6_map.c @@ -246,8 +246,11 @@ ip6_map (vlib_main_t * vm, vlib_node_runtime_t * node, vlib_frame_t * frame) { icmp46_header_t *icmp = (void *) (ip60 + 1); next0 = (icmp->type == ICMP6_echo_request - || icmp->type == - ICMP6_echo_reply) ? IP6_MAP_NEXT_IP6_LOCAL : + || icmp->type == ICMP6_echo_reply + || icmp->type == ICMP6_neighbor_solicitation + || icmp->type == ICMP6_neighbor_advertisement + || icmp->type == ICMP6_router_solicitation + || icmp->type == ICMP6_router_advertisement) ? IP6_MAP_NEXT_IP6_LOCAL : IP6_MAP_NEXT_IP6_ICMP_RELAY; } else if (ip60->protocol == IP_PROTOCOL_IPV6_FRAGMENTATION) @@ -273,8 +276,11 @@ ip6_map (vlib_main_t * vm, vlib_node_runtime_t * node, vlib_frame_t * frame) { icmp46_header_t *icmp = (void *) (ip61 + 1); next1 = (icmp->type == ICMP6_echo_request - || icmp->type == - ICMP6_echo_reply) ? IP6_MAP_NEXT_IP6_LOCAL : + || icmp->type == ICMP6_echo_reply + || icmp->type == ICMP6_neighbor_solicitation + || icmp->type == ICMP6_neighbor_advertisement + || icmp->type == ICMP6_router_solicitation + || icmp->type == ICMP6_router_advertisement) ? IP6_MAP_NEXT_IP6_LOCAL : IP6_MAP_NEXT_IP6_ICMP_RELAY; } else if (ip61->protocol == IP_PROTOCOL_IPV6_FRAGMENTATION) @@ -451,8 +457,11 @@ ip6_map (vlib_main_t * vm, vlib_node_runtime_t * node, vlib_frame_t * frame) { icmp46_header_t *icmp = (void *) (ip60 + 1); next0 = (icmp->type == ICMP6_echo_request - || icmp->type == - ICMP6_echo_reply) ? IP6_MAP_NEXT_IP6_LOCAL : + || icmp->type == ICMP6_echo_reply + || icmp->type == ICMP6_neighbor_solicitation + || icmp->type == ICMP6_neighbor_advertisement + || icmp->type == ICMP6_router_solicitation + || icmp->type == ICMP6_router_advertisement) ? IP6_MAP_NEXT_IP6_LOCAL : IP6_MAP_NEXT_IP6_ICMP_RELAY; } else if (ip60->protocol == IP_PROTOCOL_IPV6_FRAGMENTATION && Regards, Ben. -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#20523): https://lists.fd.io/g/vpp-dev/message/20523 Mute This Topic: https://lists.fd.io/mt/87167458/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
[vpp-dev] det44 and map plugins interfere with linux-cp
Hi, I have been experimenting recently with use of the linux-cp plugin to run BGP, my intention being to implement failover and load-sharing of various networking functions such as map or det44 among pools of VPP instances. (The map plugin's protocols are ideally suited to BGP anycasting, whereas for det44 I am running VRRP on the inside interface and will be adjusting my BGP advertisements of the outside addresses based on the inside VRRP status) linux-cp works a treat on its own (v21.10) and I have no trouble establishing BGP sessions using FRR, but I've been hitting some issues when enabling extra plugins on the same interfaces that my BGP control plane is using. Firstly, with the map plugin it appears to break IPv6 connectivity: the control plane can no longer successfully do NDP to the external gateway (a layer 3 switch). NDP replies from the gateway to the control plane do not arrive. There is a very simple workaround: if I put in a static neighbour entry in Linux (with 'ip neigh replace ...') everything else works. I have not yet understood why this happens although as I have a workaround I did not spent too long on investigating it. Secondly, with the det44 plugin enabled (on a different instance) then all IPv4 connectivity to the control plane was broken. In my case (running BGP only on the 'outside' interface) this is because the det44_out2in_node function is rather overzealous: it grabs all IPv4 packets received and attempts to translate them, and if it has no matching mapping configuration for the destination address it will drop the packet. I have come up with a very simple fix which works for me: if the destination address belongs to the receiving interface, then skip the TTL check and NAT: diff --git a/src/plugins/nat/det44/det44_out2in.c b/src/plugins/nat/det44/det44_out2in.c index 111bc61c4..e128b794e 100644 --- a/src/plugins/nat/det44/det44_out2in.c +++ b/src/plugins/nat/det44/det44_out2in.c @@ -425,6 +425,10 @@ VLIB_NODE_FN (det44_out2in_node) (vlib_main_t * vm, tcp0 = (tcp_header_t *) udp0; sw_if_index0 = vnet_buffer (b0)->sw_if_index[VLIB_RX]; + /* do not interfere with packets to the interface address (control plane) */ + if (PREDICT_FALSE (!det44_is_interface_addr (node, sw_if_index0, + ip0->dst_address.as_u32))) +goto trace0; if (PREDICT_FALSE (ip0->ttl == 1)) { @@ -543,6 +547,10 @@ VLIB_NODE_FN (det44_out2in_node) (vlib_main_t * vm, tcp1 = (tcp_header_t *) udp1; sw_if_index1 = vnet_buffer (b1)->sw_if_index[VLIB_RX]; + /* do not interfere with packets to the interface address (control plane) */ + if (PREDICT_FALSE (!det44_is_interface_addr (node, sw_if_index1, + ip1->dst_address.as_u32))) +goto trace1; if (PREDICT_FALSE (ip1->ttl == 1)) { @@ -688,6 +696,10 @@ VLIB_NODE_FN (det44_out2in_node) (vlib_main_t * vm, tcp0 = (tcp_header_t *) udp0; sw_if_index0 = vnet_buffer (b0)->sw_if_index[VLIB_RX]; + /* do not interfere with packets to the interface address (control plane) */ + if (PREDICT_FALSE (!det44_is_interface_addr (node, sw_if_index0, + ip0->dst_address.as_u32))) +goto trace00; if (PREDICT_FALSE (ip0->ttl == 1)) { This works well for me as I have BGP advertise routes to the outside prefixes of my det44 mappings, with the non-overlapping interface address as the next hop. The downside of this patch is if anyone is relying on being able to NAT using the first interface address, rather than a routed prefix or secondary address, then it would break it for them. Is that a use case we should try to retain support for? (Based on my recent discovery of the long standing session scavenging bug in this plugin, I suspect there are few if any other serious users of it.) My first attempt of the patch put the interface check after the mapping table lookup, inside the "if (PREDICT_FALSE (!mp0))" clauses, so it would still be possible to NAT the interface's address if you did not need to use if for control plane traffic. However, I found my BGP packets then fell foul of the TTL check in this node. Thus to continue to support NAT of the first interface address would require additional refactoring of the code to move the TTL check after the map lookup, and special handling for the ICMP code path too. I feel this would add too much complexity to the code for little gain. What do people think? Regards, Ben. -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#20521): https://lists.fd.io/g/vpp-dev/message/20521 Mute This Topic: https://lists.fd.io/mt/87167458/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] Memory leak in map plugin
Hi Ben, Thank you for the patch, that fixes it! Regards, Ben. On 17/11/2021 08:24, Benoit Ganne (bganne) via lists.fd.io wrote: Hi Ben, Thanks for the detailed report, you are correct we are missing a vec_free() before returning from the function. Can you give a try to https://gerrit.fd.io/r/c/vpp/+/34536 ? Best ben -Original Message- From: vpp-dev@lists.fd.io On Behalf Of Ben McKeegan Sent: mardi 16 novembre 2021 15:04 To: vpp-dev@lists.fd.io Subject: [vpp-dev] Memory leak in map plugin Hello, I have identified a memory leak the ip4_map function of src/plugins/ip4_map.c. I am using the 21.10 release. Enabling memory trace of the main-heap via the debug CLI and backtracing with gdb both point to all the leaked memory being allocated from the vec_add1(buffer0,pi0) macro at line 293 of ip4_map.c. In tests it is leaking approximately 50 bytes for every packet passing through this function (invariant on packet size). Here is an extract of the relevant code: exit: /* Send fragments that were added in the frame */ if (free_original_buffer0) { vlib_buffer_free_one (vm, pi0); /* Free original packet */ } else { vec_add1 (buffer0, pi0); <<<< leak is here on line 293 } frag_from0 = buffer0; frag_left0 = vec_len (buffer0); while (frag_left0 > 0) { while (frag_left0 > 0 && n_left_to_next > 0) { u32 i0; i0 = to_next[0] = frag_from0[0]; frag_from0 += 1; frag_left0 -= 1; to_next += 1; n_left_to_next -= 1; vlib_get_buffer (vm, i0)->error = error_node->errors[error0]; vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, n_left_to_next, i0, next0); } vlib_put_next_frame (vm, node, next_index, n_left_to_next); vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); } vec_reset_length (buffer0); } vlib_put_next_frame (vm, node, next_index, n_left_to_next); I must admit I do not fully understand exactly what this code is doing, but I am suspicious of the use of 'vec_reset_length' macro. I have looked at the definition of this and it appears that although this sets the length of the vector back to zero (if the pointer is non-zero), it does not release any memory that may have been allocated. Do we not need a call to 'vec_free' somewhere? Regards, Ben. -- Ben McKeegan Netservers Limited 01223 446000 ext 8103 -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#20520): https://lists.fd.io/g/vpp-dev/message/20520 Mute This Topic: https://lists.fd.io/mt/87095064/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] det44 plugin
Hi Filip, Thank you for providing the patch. To confirm, I have now tested your patch and it does indeed fix the problem. Regards, Ben. On 02/11/2021 17:46, Filip Varga via lists.fd.io wrote: Hi Ben, Thank you for pointing out the issue. Indeed it looks like the node runs just once. I will provide a patch shortly. Best regards, Filip Varga -Original Message- From: vpp-dev@lists.fd.io On Behalf Of Ben McKeegan Sent: Monday, November 1, 2021 7:24 PM To: vpp-dev@lists.fd.io Subject: [vpp-dev] det44 plugin Hello, I am fairly new to VPP so please bear with me. I am trying to use the det44 NAT plugin on 21.06 but I am experiencing some difficulties with running out of ports. It would appear that my det44 sessions are never removed despite passing the expire time. For example, I have the following setting: show det44 timeout udp timeout: 300sec tcp established timeout: 7440sec tcp transitory timeout: 240sec icmp timeout: 60sec However, if I generate a series of ICMP pings from my test host and then run 'show det44 sessions' I have a session listed for every individual ping packet, as expected, but these remain long after the 60 second timeout configured. For example on my last test I sent a flood of 100 pings which generated 100 sessions in the lists, all "state: icmp-active expire" with expiry times ranging from 171 to 173. I have just sent another 100 pings and now have another 100 sessions with expiry times ranging from 2647 to 2650, and the original 100 sessions are still there still with expiry times from 171 to 173 so these have not been refreshed or expired. I have taken a look at the source code of the plugin and I can that det44_create_expire_walk_process() is called from det44_plugin_enable(). This function appears to start a new vlib process with the 'main loop' function det44_expire_walk_fn(). According to the documentation here https://docs.fd.io/vpp/21.06/dd/d64/vlib__process__doc_8h.html I understand these despatch functions should be implemented as a while (1) {} loop that never ends. However, my reading of the det44_expire_walk_fn() function code is that it will only perform a single walk of the det44 data structures before returning to its caller. Is this a bug in det44_expire_walk_fn(), is the documentation wrong or am I misreading it? My hypothesis is that det44_expire_walk_fn() runs just once, when the plugin is first enabled (and the session table is already empty), and does not get run again thereafter. Therefore, the sessions never get expired. Regards, Ben. -- Ben McKeegan Netservers Limited 01223 446000 ext 8103 -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#20519): https://lists.fd.io/g/vpp-dev/message/20519 Mute This Topic: https://lists.fd.io/mt/86748366/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
[vpp-dev] Memory leak in map plugin
Hello, I have identified a memory leak the ip4_map function of src/plugins/ip4_map.c. I am using the 21.10 release. Enabling memory trace of the main-heap via the debug CLI and backtracing with gdb both point to all the leaked memory being allocated from the vec_add1(buffer0,pi0) macro at line 293 of ip4_map.c. In tests it is leaking approximately 50 bytes for every packet passing through this function (invariant on packet size). Here is an extract of the relevant code: exit: /* Send fragments that were added in the frame */ if (free_original_buffer0) { vlib_buffer_free_one (vm, pi0); /* Free original packet */ } else { vec_add1 (buffer0, pi0); leak is here on line 293 } frag_from0 = buffer0; frag_left0 = vec_len (buffer0); while (frag_left0 > 0) { while (frag_left0 > 0 && n_left_to_next > 0) { u32 i0; i0 = to_next[0] = frag_from0[0]; frag_from0 += 1; frag_left0 -= 1; to_next += 1; n_left_to_next -= 1; vlib_get_buffer (vm, i0)->error = error_node->errors[error0]; vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, n_left_to_next, i0, next0); } vlib_put_next_frame (vm, node, next_index, n_left_to_next); vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); } vec_reset_length (buffer0); } vlib_put_next_frame (vm, node, next_index, n_left_to_next); I must admit I do not fully understand exactly what this code is doing, but I am suspicious of the use of 'vec_reset_length' macro. I have looked at the definition of this and it appears that although this sets the length of the vector back to zero (if the pointer is non-zero), it does not release any memory that may have been allocated. Do we not need a call to 'vec_free' somewhere? Regards, Ben. -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#20496): https://lists.fd.io/g/vpp-dev/message/20496 Mute This Topic: https://lists.fd.io/mt/87095064/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Recommended NICs and drivers (was Re: [vpp-dev] About Risc-V Porting)
Hi, On 08/11/2021 19:15, Damjan Marion via lists.fd.io wrote: and that VPP works better when DPDK is not involved. Apologies for hijacking the thread but as a new VPP user who has spent many many hours reading through every scrap of documentation I can find, this statement comes as a surprise to me. Unfortunately a lot of the VPP documentation seems a bit out of date. Indeed, the documentation for most the recent release still says it 'Leverages best-of-breed open source driver technology: DPDK': https://s3-docs.fd.io/vpp/21.10/whatisvpp/performance.html So if DPDK isn't the best option any more, what do the people of this list recommend as a NIC and driver combination, and why? (Unfortunately, I don't have a huge budget at my disposal, so I'm looking for some best value-for-money options.) Thanks, Ben. -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#20460): https://lists.fd.io/g/vpp-dev/message/20460 Mute This Topic: https://lists.fd.io/mt/86928356/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
[vpp-dev] det44 plugin
Hello, I am fairly new to VPP so please bear with me. I am trying to use the det44 NAT plugin on 21.06 but I am experiencing some difficulties with running out of ports. It would appear that my det44 sessions are never removed despite passing the expire time. For example, I have the following setting: show det44 timeout udp timeout: 300sec tcp established timeout: 7440sec tcp transitory timeout: 240sec icmp timeout: 60sec However, if I generate a series of ICMP pings from my test host and then run 'show det44 sessions' I have a session listed for every individual ping packet, as expected, but these remain long after the 60 second timeout configured. For example on my last test I sent a flood of 100 pings which generated 100 sessions in the lists, all "state: icmp-active expire" with expiry times ranging from 171 to 173. I have just sent another 100 pings and now have another 100 sessions with expiry times ranging from 2647 to 2650, and the original 100 sessions are still there still with expiry times from 171 to 173 so these have not been refreshed or expired. I have taken a look at the source code of the plugin and I can that det44_create_expire_walk_process() is called from det44_plugin_enable(). This function appears to start a new vlib process with the 'main loop' function det44_expire_walk_fn(). According to the documentation here https://docs.fd.io/vpp/21.06/dd/d64/vlib__process__doc_8h.html I understand these despatch functions should be implemented as a while (1) {} loop that never ends. However, my reading of the det44_expire_walk_fn() function code is that it will only perform a single walk of the det44 data structures before returning to its caller. Is this a bug in det44_expire_walk_fn(), is the documentation wrong or am I misreading it? My hypothesis is that det44_expire_walk_fn() runs just once, when the plugin is first enabled (and the session table is already empty), and does not get run again thereafter. Therefore, the sessions never get expired. Regards, Ben. -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#20402): https://lists.fd.io/g/vpp-dev/message/20402 Mute This Topic: https://lists.fd.io/mt/86748366/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-