Public bug reported:

Hey there,

I'm looking through the docs quite extensively for references on how
SNAT and DNAT flow work to try to understand the problem related to the
issues reported in the links below:

https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718
https://mail.openvswitch.org/pipermail/ovs-dev/2021-August/386720.html

I can see these same log messages "kernel: openvswitch: ovs-system:
deferred action limit reached, drop recirc action" on gateway nodes in
my OpenStack installation.

The main problem is related to TCP/UDP traffic sent to the address of an
LRP port that is not part of any SNAT/DNAT conversation, it will keep
recirculating in the OVS data plane until TTL is 0.

The message is shown in the kernel log due to the size of the FIFO
"DEFERRED_ACTION_FIFO_SIZE", but this is a consequence of the packets
not matching the flow tables of the datapath. See kernel -
net/openvswitch/actions.c

I can reproduce on a local ovn/ovs installation building the ovn
main/master branch and ovs submodule(github projetcs). This problem also
occurs in all the latest released tags from OVN and OVS for Ubuntu 20.04
LTS.

Basically, it only happens when there is a SNAT rule to translate an
entire network (masquerade) and the return traffic does not have an open
port. If a DNAT is used for a specific host (even if the ports have not
been mapped, but if there is a 'host' to redirect the DNAT), the traffic
is forwarded and is not sent via netlink through the slowpath until it
is dropped.

The patch proposed by Krzysztof Klimonda aims to modify the flow table
via OVN communication - inserting a drp rule for traffic related to this
issue. This patch was not accepted in the project, but it made me
intrigued as to how to solve this problem (I can't just increase the
kernel DEFERRED_ACTION_FIFO_SIZE). The proposed patch is very old and
does not apply to the current code structure. I tried to adapt ovn-
northd.c to the new northd/northd.c format and applied it to upstream,
but the problem still occurs.
ovn_upstream.txt[https://github.com/openvswitch/ovs-
issues/files/8798982/ovn_upstream.txt]

I believe the patch does not solve the problem because I keep seeing
messages in the log.

Do you have any ideas on how to solve this problem?

I am adding a reproducer for this issue in the attached file.
issue_reproducer.txt[https://github.com/openvswitch/ovs-issues/files/8798161/issue_reproducer.txt]


Kind regards,
Roberto

** Affects: openvswitch
     Importance: Unknown
         Status: Unknown

** Affects: ovn (Ubuntu)
     Importance: Undecided
         Status: New

** Bug watch added: github.com/openvswitch/ovs-issues/issues #255
   https://github.com/openvswitch/ovs-issues/issues/255

** Also affects: openvswitch via
   https://github.com/openvswitch/ovs-issues/issues/255
   Importance: Unknown
       Status: Unknown

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1976285

Title:
  SNAT/DNAT - Traffic sent to LRP port recirculate until TTL=0 (drop
  recirc action)

To manage notifications about this bug go to:
https://bugs.launchpad.net/openvswitch/+bug/1976285/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to