On Fri, May 27, 2022 at 10:04 PM Ilya Maximets <i.maxim...@ovn.org> wrote:
>
> On 5/26/22 14:53, Frode Nordahl wrote:
> >
> >
> > tor. 26. mai 2022, 14:45 skrev Ilya Maximets <i.maxim...@ovn.org 
> > <mailto:i.maxim...@ovn.org>>:
> >
> >     On 5/26/22 13:00, Frode Nordahl wrote:
> >     > On Wed, May 25, 2022 at 9:55 AM Frode Nordahl
> >     > <frode.nord...@canonical.com <mailto:frode.nord...@canonical.com>> 
> > wrote:
> >     >>
> >     >> On Tue, May 24, 2022 at 1:32 PM Ilya Maximets <i.maxim...@ovn.org 
> > <mailto:i.maxim...@ovn.org>> wrote:
> >     >>>
> >     >>> On 5/24/22 12:54, Frode Nordahl wrote:
> >     >>>> On Mon, May 23, 2022 at 3:49 PM Ilya Maximets <i.maxim...@ovn.org 
> > <mailto:i.maxim...@ovn.org>> wrote:
> >     >>>>>
> >     >>>>> On 5/21/22 12:49, Frode Nordahl wrote:
> >     >>>>>> On Thu, May 19, 2022 at 3:39 PM Frode Nordahl
> >     >>>>>> <frode.nord...@canonical.com 
> > <mailto:frode.nord...@canonical.com>> wrote:
> >     >>>>>>>
> >     >>>>>>> On Sat, May 14, 2022 at 2:10 AM Ilya Maximets 
> > <i.maxim...@ovn.org <mailto:i.maxim...@ovn.org>> wrote:
> >     >>>>>>>>
> >     >>>>>>>> On 5/13/22 10:36, Frode Nordahl wrote:
> >     >>>>>>>>> On Fri, Mar 11, 2022 at 2:04 PM Liam Young 
> > <liam.yo...@canonical.com <mailto:liam.yo...@canonical.com>> wrote:
> >     >>>>>>>>>>
> >     >>>>>>>>>> Hi,
> >     >>>>>>>>>>
> >     >>>>>>>>>> <tl;dr> Commit 355fef6f2 seems to break connectivity in my 
> > setup</tl;dr>
> >     >>>>>>>>>
> >     >>>>>>>>> After OVN started colocating sNAT and dNAT operations in the 
> > same CT
> >     >>>>>>>>> zone [0] the above mentioned OVS commit appears to also break 
> > OVN [1].
> >     >>>>>>>>> I have been discussing with Numan and he has found a 
> > correlation with
> >     >>>>>>>>> the behavior of the open vswitch kernel data path conntrack
> >     >>>>>>>>> `skb_nfct_cached` function, i.e. if that is changed to always 
> > return
> >     >>>>>>>>> 'false' the erratic behavior disappears.
> >     >>>>>>>>>
> >     >>>>>>>>> So I guess the question then becomes, is this a OVS userspace 
> > or OVS
> >     >>>>>>>>> kernel problem?
> >     >>>>>>>>>
> >     >>>>>>>>> We have a reproducer in [3].
> >     >>>>>>>>>
> >     >>>>>>>>> 0: 
> > https://github.com/ovn-org/ovn/commit/4deac4509abbedd6ffaecf27eed01ddefccea40a
> >  
> > <https://github.com/ovn-org/ovn/commit/4deac4509abbedd6ffaecf27eed01ddefccea40a>
> >     >>>>>>>>> 1: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856 
> > <https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856>
> >     >>>>>>>>> 2: 
> > https://elixir.bootlin.com/linux/latest/source/net/openvswitch/conntrack.c#L683
> >  
> > <https://elixir.bootlin.com/linux/latest/source/net/openvswitch/conntrack.c#L683>
> >     >>>>>>>>> 3: 
> > https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856/comments/6 
> > <https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856/comments/6>
> >     >>>>>>>>>
> >     >>>>>>>>
> >     >>>>>>>> Hrm.  I think, there is a logical bug in implementations of 
> > ct()
> >     >>>>>>>> datapath action in both datapaths.
> >     >>>>>>>>
> >     >>>>>>>> The problem appears when the OpenFlow pipeline for the packet 
> > contains
> >     >>>>>>>> several ct() actions.  NXAST_CT action (i.e. every ct() 
> > action) must
> >     >>>>>>>> always put the packet into an untracked state.  Tracked state 
> > will be
> >     >>>>>>>> set only in the 'recirc_table' where the packet is cloned by 
> > the ct()
> >     >>>>>>>> action for further processing.
> >     >>>>>>>>
> >     >>>>>>>> If an OF pipeline looks like this:
> >     >>>>>>>>
> >     >>>>>>>>   actions=ct(),something,something,ct(),something
> >     >>>>>>>>
> >     >>>>>>>> each action must be entered with the 'untracked' conntrack 
> > state in
> >     >>>>>>>> the packet metadata.
> >     >>>>>>>>
> >     >>>>>>>> However, ct() action in the datapath, unlike OpenFlow, doesn't 
> > work
> >     >>>>>>>> this way.  It modifies the conntrack state for the packet it 
> > is processing.
> >     >>>>>>>> During the flow translation OVS inserts recirculation right 
> > after the
> >     >>>>>>>> datapath ct() action.  This way conntrack state affects only 
> > processing
> >     >>>>>>>> after recirculation.  Sort of.
> >     >>>>>>>>
> >     >>>>>>>> The issue is that not all OpenFlow ct() actions have 
> > recirc_table
> >     >>>>>>>> specified.  These actions supposed to change the state of the 
> > conntrack
> >     >>>>>>>> subsystem, but they still must leave the packet itself in the 
> > untracked
> >     >>>>>>>> state with no strings attached.  But since datapath ct() 
> > actions doesn't
> >     >>>>>>>> work that way and since recirculation is not inserted (no 
> > recirc_table
> >     >>>>>>>> specified), packet after conntrack execution carries the state 
> > along to
> >     >>>>>>>> all other actions.
> >     >>>>>>>> This doesn't impact normal actions, but seems to break 
> > subsequent ct()
> >     >>>>>>>> actions on the same pipeline.
> >     >>>>>>>>
> >     >>>>>>>> In general, it looks to me that ct() action in the datapath is
> >     >>>>>>>> not supposed to cache any connection data inside the packet's 
> > metadata.
> >     >>>>>>>> This seems to be the root cause of the problem.  Fields that 
> > OF knows
> >     >>>>>>>> about should not trigger any issues if carried along with a 
> > packet,
> >     >>>>>>>> but datapath-specific metadata can not be cleared, because 
> > OFproto
> >     >>>>>>>> simply doesn't know about it.
> >     >>>>>>>>
> >     >>>>>>>> But, I guess, not caching the connection and other internal 
> > structures
> >     >>>>>>>> might significantly impact datapath performance.  Will it?
> >     >>>>>>>>
> >     >>>>>>>> Looking at the reproducer in [3], it has, for example, the 
> > flow with the
> >     >>>>>>>> following actions:
> >     >>>>>>>>
> >     >>>>>>>>   actions:ct(commit,zone=8,mark=0/0x1,nat(src)),
> >     >>>>>>>>           
> > set(eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:06)),
> >     >>>>>>>>           set(ipv4(src=172.18.2.10,dst=192.168.100.6,ttl=62)),
> >     >>>>>>>>           ct(zone=8),recirc(0x4)
> >     >>>>>>>>
> >     >>>>>>>> So, if the first ct() will change some datapath-specific 
> > packet metadata,
> >     >>>>>>>> the second ct() may try to use that information.
> >     >>>>>>>>
> >     >>>>>>>> It looks like during the flow translation we must add ct_clear 
> > action
> >     >>>>>>>> explicitly after every ct() action, unless it was the last 
> > action in
> >     >>>>>>>> the list.  This will end up with adding a lot of ct_clear 
> > actions
> >     >>>>>>>> everywhere.
> >     >>>>>>>>
> >     >>>>>>>> Another option is the patch below that tries to track if 
> > ct_clear
> >     >>>>>>>> is required and adds that action before the next ct() action in
> >     >>>>>>>> the datapath.
> >     >>>>>>>>
> >     >>>>>>>> BTW, the test [3] fails with both kernel and userspace 
> > datapaths.
> >     >>>>>>>>
> >     >>>>>>>> The change below should fix the problem, I think.
> >     >>>>>>>> It looks like we also have to put ct_clear action before 
> > translating
> >     >>>>>>>> output to the patch port if we're in 'conntracked' state.
> >     >>>>>>>>
> >     >>>>>>>> And I do not know how to fix the problem for kernels without 
> > ct_clear
> >     >>>>>>>> support.  I don't think we can clear metadata that is internal 
> > to
> >     >>>>>>>> the kernel.  Ideas are welcome.
> >     >>>>>>>>
> >     >>>>>>>> CC: Aaron, Paolo.
> >     >>>>>>>>
> >     >>>>>>>> Any thoughts?
> >     >>>>>>>
> >     >>>>>>> Thank you so much for the detailed explanation of what's going 
> > on and
> >     >>>>>>> for providing a proposed fix.
> >     >>>>>>>
> >     >>>>>>> I took it for a spin and it does indeed appear to fix the OVN 
> > hairpin
> >     >>>>>>> issue, but it does unfortunately not appear to fix the issue 
> > Liam
> >     >>>>>>> reported for the ML2/OVS use case [4].
> >     >>>>>>>
> >     >>>>>>> When trying that use case with your patch I see this in the 
> > Open vSwitch log:
> >     >>>>>>> 
> > 2022-05-19T08:17:02.668Z|00001|ofproto_dpif_xlate(handler1)|WARN|over
> >     >>>>>>> max translation depth 64 on bridge br-int while processing
> >     >>>>>>> 
> > ct_state=new|trk,ct_ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ct_ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ct_nw_proto=58,ct_tp_src=128,ct_tp_dst=0,icmp6,in_port=3,vlan_tci=0x0000,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ipv6_label=0xac5db,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=128,icmp_code=0
> >     >>>>>>> 
> > 2022-05-19T08:17:02.668Z|00002|ofproto_dpif_upcall(handler1)|WARN|Dropped
> >     >>>>>>> 2 log messages in last 205 seconds (most recently, 187 seconds 
> > ago)
> >     >>>>>>> due to excessive rate
> >     >>>>>>> 
> > 2022-05-19T08:17:02.668Z|00003|ofproto_dpif_upcall(handler1)|WARN|Flow:
> >     >>>>>>> 
> > ct_state=new|trk,ct_ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ct_ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ct_nw_proto=58,ct_tp_src=128,ct_tp_dst=0,icmp6,in_port=5,vlan_tci=0x0000,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ipv6_label=0xac5db,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=128,icmp_code=0
> >     >>>>>>>
> >     >>>>>>> bridge("br-int")
> >     >>>>>>> ----------------
> >     >>>>>>>  0. priority 0, cookie 0x56d6d58c018b51bd
> >     >>>>>>>     goto_table:60
> >     >>>>>>> 60. priority 3, cookie 0x56d6d58c018b51bd
> >     >>>>>>>     NORMAL
> >     >>>>>>>      >>>> received packet on unknown port 5 <<<<
> >     >>>>>>>      >> no input bundle, dropping
> >     >>>>>>>
> >     >>>>>>> Final flow: unchanged
> >     >>>>>>> Megaflow: 
> > recirc_id=0,eth,ipv6,in_port=5,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,nw_frag=no
> >     >>>>>>> Datapath actions: drop
> >     >>>>>>>
> >     >>>>>>>
> >     >>>>>>> On the back of your explanation and the log output I made an 
> > attempt
> >     >>>>>>> at adding this patch on top of yours:
> >     >>>>>>> ---
> >     >>>>>>> diff --git a/ofproto/ofproto-dpif-xlate.c 
> > b/ofproto/ofproto-dpif-xlate.c
> >     >>>>>>> index 110dab0ec..2955e8e4d 100644
> >     >>>>>>> --- a/ofproto/ofproto-dpif-xlate.c
> >     >>>>>>> +++ b/ofproto/ofproto-dpif-xlate.c
> >     >>>>>>> @@ -7144,6 +7144,7 @@ do_xlate_actions(const struct ofpact 
> > *ofpacts,
> >     >>>>>>> size_t ofpacts_len,
> >     >>>>>>>           * resubmit to the frozen actions.
> >     >>>>>>>           */
> >     >>>>>>>          case OFPACT_RESUBMIT:
> >     >>>>>>> +            ctx->pending_ct_clear = true;
> >     >>>>>>>              xlate_ofpact_resubmit(ctx, ofpact_get_RESUBMIT(a), 
> > last);
> >     >>>>>>>              continue;
> >     >>>>>>>          case OFPACT_GOTO_TABLE:
> >     >>>>>>> ---
> >     >>>>>>>
> >     >>>>>>> And the two patches together do appear to resolve the issue 
> > reported
> >     >>>>>>> in [4] as well as the OVN hairpin issue [1]. It does however 
> > make a
> >     >>>>>>> couple of tests fail, so I need to look into if that is 
> > expected from
> >     >>>>>>> the change or if the approach must be changed.
> >     >>>>>>>
> >     >>>>>>> 4: https://bugs.launchpad.net/openvswitch/+bug/1964117 
> > <https://bugs.launchpad.net/openvswitch/+bug/1964117>
> >     >>>>>>
> >     >>>>>> The following patch also works and does not cause any tests to 
> > fail.
> >     >>>>>>
> >     >>>>>> ---
> >     >>>>>> diff --git a/ofproto/ofproto-dpif-xlate.c 
> > b/ofproto/ofproto-dpif-xlate.c
> >     >>>>>> index 110dab0ec..905f6994d 100644
> >     >>>>>> --- a/ofproto/ofproto-dpif-xlate.c
> >     >>>>>> +++ b/ofproto/ofproto-dpif-xlate.c
> >     >>>>>> @@ -7354,7 +7354,9 @@ do_xlate_actions(const struct ofpact 
> > *ofpacts,
> >     >>>>>> size_t ofpacts_len,
> >     >>>>>>              break;
> >     >>>>>>
> >     >>>>>>          case OFPACT_CT_CLEAR:
> >     >>>>>> -            if (ctx->conntracked || ctx->pending_ct_clear) {
> >     >>>>>> +            if (ctx->conntracked || ctx->pending_ct_clear
> >     >>>>>> +                || (flow->ct_state && flow->ct_state & 
> > CS_TRACKED))
> >     >>>>>> +            {
> >     >>>>>>                  compose_ct_clear_action(ctx);
> >     >>>>>>              }
> >     >>>>>>              break;
> >     >>>>>> ---
> >     >>>>>>
> >     >>>>>> The OpenStack ML2/OVS Open vSwitch firewall driver appears to 
> > make
> >     >>>>>> explicit use of ct_clear action as part of progressing packets 
> > through
> >     >>>>>> the tables and the optimization in 355fef6f2 interrupts that 
> > workflow.
> >     >>>>>>
> >     >>>>>> I am not quite sure if the definition of `struct
> >     >>>>>> xlate_ctx->conntracked` is "this flow has been exposed to 
> > conntrack"
> >     >>>>>> or if it is "conntrack sees this flow as established", if anyone 
> > can
> >     >>>>>> shed light on that we would know if the above patch is ok as is 
> > or if
> >     >>>>>> this flow state should have set struct xlate_ctx->conntracked to 
> > true
> >     >>>>>> in the first place.
> >     >>>>>
> >     >>>>> I beleive that ctx->conntracked should men that "this flow has 
> > been exposed
> >     >>>>> to conntrack".  And it looks like the condition:
> >     >>>>>  (!ctx->conntracked && flow->ct_state & CS_TRACKED)
> >     >>>>> should not be possible.  We're, probably, missing 
> > clear_conntrack() call
> >     >>>>> somewhere.
> >     >>>>>
> >     >>>>> One thing I seem to miss in my patch is the conntrack clear after 
> > returning
> >     >>>>> from the patch port processing:
> >     >>>>>
> >     >>>>> diff --git a/ofproto/ofproto-dpif-xlate.c 
> > b/ofproto/ofproto-dpif-xlate.c
> >     >>>>> index 110dab0ec..53d4f78b2 100644
> >     >>>>> --- a/ofproto/ofproto-dpif-xlate.c
> >     >>>>> +++ b/ofproto/ofproto-dpif-xlate.c
> >     >>>>> @@ -3955,8 +3955,13 @@ patch_port_output(struct xlate_ctx *ctx, 
> > const struct xport *in_dev,
> >     >>>>>
> >     >>>>>      /* The out bridge's conntrack execution should have no 
> > effect on the
> >     >>>>>       * original bridge. */
> >     >>>>> -    ctx->conntracked = old_conntrack;
> >     >>>>>      ctx->pending_ct_clear = old_ct_clear;
> >     >>>>> +    if (ctx->conntracked) {
> >     >>>>> +        /* Conntrack was involved in the other bridge.  We need 
> > to clear
> >     >>>>> +         * whatever information was cached in the datapath. */
> >     >>>>> +        ctx->pending_ct_clear = true;
> >     >>>>> +    }
> >     >>>>> +    ctx->conntracked = old_conntrack;
> >     >>>>>
> >     >>>>>      /* The fact that the out bridge exits (for any reason) does 
> > not mean
> >     >>>>>       * that the original bridge should exit.  Specifically, if 
> > the out
> >     >>>>> ---
> >     >>>>>
> >     >>>>> But that doesn't seem to be related to the (!ctx->conntracked &&
> >     >>>>> flow->ct_state & CS_TRACKED) case...
> >     >>>>
> >     >>>> Thank you for confirming that ctx->conntracked should have been 
> > set in
> >     >>>> this flow state, that led me onto the possible root and fix of this
> >     >>>> particular use case.
> >     >>>>
> >     >>>> Enabling debug log for dpif module while initiating a new flow I 
> > see
> >     >>>> that the flow can enter the system as a miss upcall without
> >     >>>> recirculation and with ct_state set:
> >     >>>> 
> > 2022-05-24T10:42:57.869Z|00036|dpif(handler6)|DBG|system@ovs-system:
> >     >>>> miss upcall:
> >     >>>> 
> > recirc_id(0),dp_hash(0),skb_priority(0),in_port(6),skb_mark(0),ct_state(0x21),ct_zone(0),ct_mark(0),ct_label(0),ct_tuple6(src=fc00:a3f0:723f:7bc4:f816:3eff:fe81:224d,dst=fc00:a3f0:723f:7bc4:f816:3eff:fee5:e7b2,proto=58,src_port=128,dst_port=0),eth(src=fa:16:3e:81:22:4d,dst=fa:16:3e:e5:e7:b2),eth_type(0x86dd),ipv6(src=fc00:a3f0:723f:7bc4:f816:3eff:fe81:224d,dst=fc00:a3f0:723f:7bc4:f816:3eff:fee5:e7b2,label=0x357f1,proto=58,tclass=0,hlimit=64,frag=no),icmpv6(type=128,code=0)
> >     >>>>
> >     >>>> Looking at the code it appears that for this state the
> >     >>>> ctx->conntracked would not be set, it is currently initialized to
> >     >>>> 'false' and only updated based on frozen state for recirculated
> >     >>>> packets.
> >     >>>>
> >     >>>> Adding this patch resolves inconsistency we discussed above and
> >     >>>> subsequently the ML2/OVS problem:
> >     >>>> diff --git a/ofproto/ofproto-dpif-xlate.c 
> > b/ofproto/ofproto-dpif-xlate.c
> >     >>>> index 110dab0ec..7bc7426ac 100644
> >     >>>> --- a/ofproto/ofproto-dpif-xlate.c
> >     >>>> +++ b/ofproto/ofproto-dpif-xlate.c
> >     >>>> @@ -7852,6 +7852,12 @@ xlate_actions(struct xlate_in *xin, struct
> >     >>>> xlate_out *xout)
> >     >>>>          goto exit;
> >     >>>>      }
> >     >>>>
> >     >>>> +    if (!xin->frozen_state
> >     >>>> +        && xin->flow.ct_state
> >     >>>> +        && xin->flow.ct_state & CS_TRACKED) {
> >     >>>> +        ctx.conntracked = true;
> >     >>>> +    }
> >     >>>> +
> >     >>>>      /* Tunnel metadata in udpif format must be normalized before
> >     >>>> translation. */
> >     >>>>      if (flow->tunnel.flags & FLOW_TNL_F_UDPIF) {
> >     >>>>          const struct tun_table *tun_tab = ofproto_get_tun_tab(
> >     >>>> ---
> >     >>>>
> >     >>>> What do you think?
>
> I was ready to send the kernel patch below, but when I tried to run
> system tests, 2 of them failed.  And then I discovered this gem:
>
>   c2926d6d1cfd ("system-traffic: Add ct tests using local stack.")

Good catch! I only tested your proposed kernel change with the OVN
system tests and not the OVS ones.

> So, it might be that your change to set the 'conntracked' to 'true'
> is actually the right one, since OVS system tests are expecting
> this scenario to work...

Ok, I've found a usable system test for this condition, so I'm ready
to submit something for this bit:
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 239105e89..53af3d3da 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -6807,6 +6807,46 @@ AT_CHECK([ovs-ofctl dump-flows br0 | grep
table=2, | OFPROTO_CLEAR_DURATION_IDLE
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP

+dnl This is essentially a copy of the "datapath - ping over geneve tunnel"
+dnl test, and we use it to confirm OVS behavior when ct_state is set on
+dnl flows by the kernel without conscious action by the OVS user space code.
+AT_SETUP([conntrack - ct_state from outside OVS])
+OVS_CHECK_TUNNEL_TSO()
+OVS_CHECK_GENEVE()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay
"priority=100,ct_state=+trk,actions=ct_clear,resubmit(,0)"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "priority=10,actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1], [10.1.1.100/24])
+ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100], [10.1.1.1/24],
+                  [vni 0])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_BANNER([IGMP])

 AT_SETUP([IGMP - flood under normal action])
---

I should probably add something to confirm the ct_state=+trk flow rule
is hit so this does not suddenly turn into a false positive.


The ct state across patch ports issue is a bit more involved to test,
so I wonder if we should use a OVN test for that? We could of course
have OVN stage it and see if it is easy to extract something to encode
into OVS system tests.

> I'm starting to questioning reality here already... Will get back
> to that issue next week.

Friday afternoon can have that effect on the best of us :-)

-- 
Frode Nordahl

> >     >>>>
> >     >>>
> >     >>> Hrm.  I suspected something like this.  My guess is that packet
> >     >>> gets conntracked in OVS, then leaves OVS, gets routed inside the
> >     >>> kernel and enters OVS again while retaining the ct state from the
> >     >>> previous run.  Can you confirm that?
> >     >>
> >     >> In this setup there is an interface present in the system attached to
> >     >> OVS using an interface of type internal. It's essentially there to
> >     >> give a path into an OpenStack managed overlay network for some
> >     >> management daemons running "outside the cloud".
> >     >>
> >     >> Without the patches we've been working on in this thread so far,
> >     >> looking at the OF flow configuration side the packets traverse the
> >     >> tables like this:
> >     >> table=0, n_packets=6, n_bytes=604, priority=0 actions=resubmit(,59)
> >     >> table=59, n_packets=6, n_bytes=604, priority=0 actions=resubmit(,60)
> >     >> table=60, n_packets=4, n_bytes=384, priority=100,in_port="o-hm0"
> >     >> actions=load:0x3->NXM_NX_REG5[],load:0x1->NXM_NX_REG6[],resubmit(,71)
> >     >> table=71, n_packets=65, n_bytes=7670, priority=110,ct_state=+trk
> >     >> actions=ct_clear,resubmit(,71)
> >     >>
> >     >> Since we currently ignore the request for the ct_clear action in this
> >     >> state the packet will be resubmitted until hitting the limit in the
> >     >> last rule above.
> >     >>
> >     >> The OVS kernel event tracing output from the same flow look like 
> > this:
> >     >>             ping-207769  [010] ..... 33681.265450: ovs_dp_upcall:
> >     >> dpaddr=0000000053422f7e dp_name=ovs-system dev=o-hm0
> >     >> skbaddr=00000000662a695d len=118 data_len=0 truesize=768 nr_frags=0
> >     >> gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000 recirc_id=0x00000000
> >     >> keyaddr=00000000900b6d94 eth_type=0xdd86 ct_state=21 ct_orig_proto=3a
> >     >> ct_zone=0000 flow_key_valid=1 upcall_cmd=1 upcall_port=2805034951
> >     >> upcall_mru=0
> >     >>           <idle>-0       [010] ..s.. 33686.518866: ovs_dp_upcall:
> >     >> dpaddr=0000000053422f7e dp_name=ovs-system dev=o-hm0
> >     >> skbaddr=00000000ab9954ee len=86 data_len=0 truesize=768 nr_frags=0
> >     >> gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000 recirc_id=0x00000000
> >     >> keyaddr=00000000cfad77d1 eth_type=0xdd86 ct_state=00 ct_orig_proto=00
> >     >> ct_zone=0000 flow_key_valid=1 upcall_cmd=1 upcall_port=2805034951
> >     >> upcall_mru=0
> >     >>        handler12-205602  [010] ..... 33686.519158:
> >     >> ovs_do_execute_action: dpaddr=0000000053422f7e dp_name=ovs-system
> >     >> dev=o-hm0 skbaddr=00000000662a695d len=86 data_len=0 truesize=768
> >     >> nr_frags=0 gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000
> >     >> recirc_id=0x00000000 keyaddr=0000000008b47a0c eth_type=0xdd86
> >     >> ct_state=00 ct_orig_proto=00 ct_Zone=0000 flow_key_valid=1
> >     >> action_type=3 action_len=12 action_data=0000000046f8c388 is_last=0
> >     >>        handler12-205602  [010] ..... 33686.519159:
> >     >> ovs_do_execute_action: dpaddr=0000000053422f7e dp_name=ovs-system
> >     >> dev=o-hm0 skbaddr=00000000662a695d len=86 data_len=0 truesize=768
> >     >> nr_frags=0 gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000
> >     >> recirc_id=0x00000000 keyaddr=0000000008b47a0c eth_type=0xdd86
> >     >> ct_state=00 ct_orig_proto=00 ct_Zone=0000 flow_key_valid=1
> >     >> action_type=1 action_len=4 action_data=00000000acdfdf2a is_last=0
> >     >>        handler12-205602  [010] ..... 33686.519197:
> >     >> ovs_do_execute_action: dpaddr=0000000053422f7e dp_name=ovs-system
> >     >> dev=o-hm0 skbaddr=00000000662a695d len=86 data_len=0 truesize=768
> >     >> nr_frags=0 gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000
> >     >> recirc_id=0x00000000 keyaddr=0000000008b47a0c eth_type=0xdd86
> >     >> ct_state=00 ct_orig_proto=00 ct_Zone=0000 flow_key_valid=1
> >     >> action_type=4 action_len=4 action_data=0000000069492ddd is_last=0
> >     >>        handler12-205602  [010] ..... 33686.519197:
> >     >> ovs_do_execute_action: dpaddr=0000000053422f7e dp_name=ovs-system
> >     >> dev=o-hm0 skbaddr=00000000662a695d len=86 data_len=0 truesize=768
> >     >> nr_frags=0 gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000
> >     >> recirc_id=0x00000000 keyaddr=0000000008b47a0c eth_type=0xdd86
> >     >> ct_state=00 ct_orig_proto=00 ct_Zone=0000 flow_key_valid=1
> >     >> action_type=1 action_len=4 action_data=00000000e5733215 is_last=1
> >     >>           <idle>-0       [011] ..s.. 33686.521143: ovs_dp_upcall:
> >     >> dpaddr=0000000053422f7e dp_name=ovs-system dev=gre_sys
> >     >> skbaddr=00000000369641d8 len=78 data_len=0 truesize=768 nr_frags=0
> >     >> gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000 recirc_id=0x00000000
> >     >> keyaddr=00000000dc600580 eth_type=0xdd86 ct_state=00 ct_orig_proto=00
> >     >> ct_zone=0000 flow_key_valid=1 upcall_cmd=1 upcall_port=3508503525
> >     >> upcall_mru=0
> >     >>        handler13-205603  [011] ..... 33686.521383:
> >     >> ovs_do_execute_action: dpaddr=0000000053422f7e dp_name=ovs-system
> >     >> dev=gre_sys skbaddr=000000007efe6e4d len=78 data_len=0 truesize=768
> >     >> nr_frags=0 gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000
> >     >> recirc_id=0x00000000 keyaddr=00000000303b841b eth_type=0xdd86
> >     >> ct_state=00 ct_orig_proto=00 ct_Zone=0000 flow_key_valid=1
> >     >> action_type=1 action_len=4 action_data=0000000033bb5908 is_last=1
> >     >>
> >     >> So to me it looks like the packet is conntracked before entering OVS
> >     >> the first time, and I guess we would have the same situation if it
> >     >> took a path like you suggested above.
> >     >>
> >     >>> If that's the case, I think, we actually have to clear ct state
> >     >>> while packet is first entering the kernel datapath.  Otherwise
> >     >>> it can match on incorrect flows in the kenrel cache.  Marking
> >     >>> it as conntracked will probably be against the logic of the OF
> >     >>> pipeline.
> >     >>
> >     >> We did use to clear this from the OVS user space side, but I agree
> >     >> with you that that is problematic. This is probably why Numan was on
> >     >> the track of that in-kernel NF cache function.
> >     >>
> >     >>> Something like this (not tested):
> >     >>>
> >     >>> diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
> >     >>> index 82a74f998966..ebea64494196 100644
> >     >>> --- a/net/openvswitch/vport.c
> >     >>> +++ b/net/openvswitch/vport.c
> >     >>> @@ -451,6 +451,12 @@ int ovs_vport_receive(struct vport *vport, 
> > struct sk_buff *skb,
> >     >>>                 kfree_skb(skb);
> >     >>>                 return error;
> >     >>>         }
> >     >>> +
> >     >>> +       /* Packets entering OVS has to be in untracked state.  
> > Clearing here
> >     >>> +        * to be sure that no state persisted from the last pass 
> > through OVS.
> >     >>> +        */
> >     >>> +       ovs_ct_clear(skb, &key);
> >     >>> +
> >     >>>         ovs_dp_process_packet(skb, &key);
> >     >>>         return 0;
> >     >>>  }
> >     >>> ---
> >     >>
> >     >> Yep, this works:
> >     >>             ping-3265    [011] .....    97.820835: ovs_dp_upcall:
> >     >> dpaddr=00000000a05ba3f9 dp_name=ovs-system dev=o-hm0
> >     >> skbaddr=000000003b1e06d0 len=118 data_len=0 truesize=768 nr_frags=0
> >     >> gso_size=0 gso_type=0x0 ovs_flow_hash=0x00000000 recirc_id=0x00000000
> >     >> keyaddr=0000000077e1165d eth_type=0xdd86 ct_state=00 ct_orig_proto=00
> >     >> ct_zone=0000 flow_key_valid=1 upcall_cmd=1 upcall_port=2787207471
> >     >> upcall_mru=0
> >     >>
> >     >> A end to end workload test for the system described in [4] with this
> >     >> patched kernel is also successful. For good measure I also ran 
> > through
> >     >> the OVN kernel system testsuite and observed no regressions with this
> >     >> kernel.
> >     >>
> >     >>
> >     >> So to sum up, I guess we have two distinct bugs here and we need your
> >     >> proposed patch for ovs-dpif to fix the OVN hairpin issue, and the
> >     >> kernel patch for the conntracked packet entering the system issue 
> > that
> >     >> affects ML2/OVS.
> >     >
> >     > If we agree on how to proceed I'd be happy to help put this into
> >     > formal patches with tests and appropriate accreditation etc. Let me
> >     > know what you think and how you would prefer we can split the work of
> >     > bringing this to a resolution.
> >     >
> >
> >     I'm working on a good commit message for the kernel piece above
> >     (ovs_vport_receive), will send it soon.
> >
> >
> > Great, thank you for doing that!
> >
> >     If you can help with stitching all the pieces for the OVS userspace
> >     part and writing tests for it, that would be great.
> >
> >
> > I will work on this part and hopefully submit tomorrow.
> >
> > --
> > Frode Nordahl
> >
> >     Best regards, Ilya Maximets.
> >
>
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to