Hi,
I have been experimenting recently with use of the linux-cp plugin to
run BGP, my intention being to implement failover and load-sharing of
various networking functions such as map or det44 among pools of VPP
instances. (The map plugin's protocols are ideally suited to BGP
anycasting, whereas for det44 I am running VRRP on the inside interface
and will be adjusting my BGP advertisements of the outside addresses
based on the inside VRRP status)
linux-cp works a treat on its own (v21.10) and I have no trouble
establishing BGP sessions using FRR, but I've been hitting some issues
when enabling extra plugins on the same interfaces that my BGP control
plane is using.
Firstly, with the map plugin it appears to break IPv6 connectivity: the
control plane can no longer successfully do NDP to the external gateway
(a layer 3 switch). NDP replies from the gateway to the control plane
do not arrive. There is a very simple workaround: if I put in a static
neighbour entry in Linux (with 'ip neigh replace ...') everything else
works. I have not yet understood why this happens although as I have a
workaround I did not spent too long on investigating it.
Secondly, with the det44 plugin enabled (on a different instance) then
all IPv4 connectivity to the control plane was broken. In my case
(running BGP only on the 'outside' interface) this is because the
det44_out2in_node function is rather overzealous: it grabs all IPv4
packets received and attempts to translate them, and if it has no
matching mapping configuration for the destination address it will drop
the packet.
I have come up with a very simple fix which works for me: if the
destination address belongs to the receiving interface, then skip the
TTL check and NAT:
diff --git a/src/plugins/nat/det44/det44_out2in.c
b/src/plugins/nat/det44/det44_out2in.c
index 111bc61c4..e128b794e 100644
--- a/src/plugins/nat/det44/det44_out2in.c
+++ b/src/plugins/nat/det44/det44_out2in.c
@@ -425,6 +425,10 @@ VLIB_NODE_FN (det44_out2in_node) (vlib_main_t * vm,
tcp0 = (tcp_header_t *) udp0;
sw_if_index0 = vnet_buffer (b0)->sw_if_index[VLIB_RX];
+ /* do not interfere with packets to the interface address
(control plane) */
+ if (PREDICT_FALSE (!det44_is_interface_addr (node, sw_if_index0,
+
ip0->dst_address.as_u32)))
+ goto trace0;
if (PREDICT_FALSE (ip0->ttl == 1))
{
@@ -543,6 +547,10 @@ VLIB_NODE_FN (det44_out2in_node) (vlib_main_t * vm,
tcp1 = (tcp_header_t *) udp1;
sw_if_index1 = vnet_buffer (b1)->sw_if_index[VLIB_RX];
+ /* do not interfere with packets to the interface address
(control plane) */
+ if (PREDICT_FALSE (!det44_is_interface_addr (node, sw_if_index1,
+
ip1->dst_address.as_u32)))
+ goto trace1;
if (PREDICT_FALSE (ip1->ttl == 1))
{
@@ -688,6 +696,10 @@ VLIB_NODE_FN (det44_out2in_node) (vlib_main_t * vm,
tcp0 = (tcp_header_t *) udp0;
sw_if_index0 = vnet_buffer (b0)->sw_if_index[VLIB_RX];
+ /* do not interfere with packets to the interface address
(control plane) */
+ if (PREDICT_FALSE (!det44_is_interface_addr (node, sw_if_index0,
+
ip0->dst_address.as_u32)))
+ goto trace00;
if (PREDICT_FALSE (ip0->ttl == 1))
{
This works well for me as I have BGP advertise routes to the outside
prefixes of my det44 mappings, with the non-overlapping interface
address as the next hop.
The downside of this patch is if anyone is relying on being able to NAT
using the first interface address, rather than a routed prefix or
secondary address, then it would break it for them. Is that a use case
we should try to retain support for? (Based on my recent discovery of
the long standing session scavenging bug in this plugin, I suspect there
are few if any other serious users of it.)
My first attempt of the patch put the interface check after the mapping
table lookup, inside the "if (PREDICT_FALSE (!mp0))" clauses, so it
would still be possible to NAT the interface's address if you did not
need to use if for control plane traffic. However, I found my BGP
packets then fell foul of the TTL check in this node. Thus to continue
to support NAT of the first interface address would require additional
refactoring of the code to move the TTL check after the map lookup, and
special handling for the ICMP code path too. I feel this would add too
much complexity to the code for little gain.
What do people think?
Regards,
Ben.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20521): https://lists.fd.io/g/vpp-dev/message/20521
Mute This Topic: https://lists.fd.io/mt/87167458/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-