Hi,

I have been experimenting recently with use of the linux-cp plugin to run BGP, my intention being to implement failover and load-sharing of various networking functions such as map or det44 among pools of VPP instances. (The map plugin's protocols are ideally suited to BGP anycasting, whereas for det44 I am running VRRP on the inside interface and will be adjusting my BGP advertisements of the outside addresses based on the inside VRRP status)

linux-cp works a treat on its own (v21.10) and I have no trouble establishing BGP sessions using FRR, but I've been hitting some issues when enabling extra plugins on the same interfaces that my BGP control plane is using.

Firstly, with the map plugin it appears to break IPv6 connectivity: the control plane can no longer successfully do NDP to the external gateway (a layer 3 switch). NDP replies from the gateway to the control plane do not arrive. There is a very simple workaround: if I put in a static neighbour entry in Linux (with 'ip neigh replace ...') everything else works. I have not yet understood why this happens although as I have a workaround I did not spent too long on investigating it.

Secondly, with the det44 plugin enabled (on a different instance) then all IPv4 connectivity to the control plane was broken. In my case (running BGP only on the 'outside' interface) this is because the det44_out2in_node function is rather overzealous: it grabs all IPv4 packets received and attempts to translate them, and if it has no matching mapping configuration for the destination address it will drop the packet.

I have come up with a very simple fix which works for me: if the destination address belongs to the receiving interface, then skip the TTL check and NAT:


diff --git a/src/plugins/nat/det44/det44_out2in.c b/src/plugins/nat/det44/det44_out2in.c
index 111bc61c4..e128b794e 100644
--- a/src/plugins/nat/det44/det44_out2in.c
+++ b/src/plugins/nat/det44/det44_out2in.c
@@ -425,6 +425,10 @@ VLIB_NODE_FN (det44_out2in_node) (vlib_main_t * vm,
       tcp0 = (tcp_header_t *) udp0;

       sw_if_index0 = vnet_buffer (b0)->sw_if_index[VLIB_RX];
+ /* do not interfere with packets to the interface address (control plane) */
+      if (PREDICT_FALSE (!det44_is_interface_addr (node, sw_if_index0,
+ ip0->dst_address.as_u32)))
+            goto trace0;

       if (PREDICT_FALSE (ip0->ttl == 1))
        {
@@ -543,6 +547,10 @@ VLIB_NODE_FN (det44_out2in_node) (vlib_main_t * vm,
       tcp1 = (tcp_header_t *) udp1;

       sw_if_index1 = vnet_buffer (b1)->sw_if_index[VLIB_RX];
+ /* do not interfere with packets to the interface address (control plane) */
+      if (PREDICT_FALSE (!det44_is_interface_addr (node, sw_if_index1,
+ ip1->dst_address.as_u32)))
+            goto trace1;

       if (PREDICT_FALSE (ip1->ttl == 1))
        {
@@ -688,6 +696,10 @@ VLIB_NODE_FN (det44_out2in_node) (vlib_main_t * vm,
       tcp0 = (tcp_header_t *) udp0;

       sw_if_index0 = vnet_buffer (b0)->sw_if_index[VLIB_RX];
+ /* do not interfere with packets to the interface address (control plane) */
+      if (PREDICT_FALSE (!det44_is_interface_addr (node, sw_if_index0,
+ ip0->dst_address.as_u32)))
+            goto trace00;

       if (PREDICT_FALSE (ip0->ttl == 1))
        {

This works well for me as I have BGP advertise routes to the outside prefixes of my det44 mappings, with the non-overlapping interface address as the next hop.

The downside of this patch is if anyone is relying on being able to NAT using the first interface address, rather than a routed prefix or secondary address, then it would break it for them. Is that a use case we should try to retain support for? (Based on my recent discovery of the long standing session scavenging bug in this plugin, I suspect there are few if any other serious users of it.)

My first attempt of the patch put the interface check after the mapping table lookup, inside the "if (PREDICT_FALSE (!mp0))" clauses, so it would still be possible to NAT the interface's address if you did not need to use if for control plane traffic. However, I found my BGP packets then fell foul of the TTL check in this node. Thus to continue to support NAT of the first interface address would require additional refactoring of the code to move the TTL check after the map lookup, and special handling for the ICMP code path too. I feel this would add too much complexity to the code for little gain.

What do people think?

Regards,
Ben.

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20521): https://lists.fd.io/g/vpp-dev/message/20521
Mute This Topic: https://lists.fd.io/mt/87167458/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to