Thanks Han for the explanation. Yes, there is no east-west traffic between the GRs (I was just curious to know). So, if the ARP request/response between GR and DR is confined to the same chassis, then there shouldn't be O(n^2) explosion per-your explanation.
Will get back to you on how the test goes in the next few days. Regards, Girish On Sat, May 16, 2020 at 11:17 PM Han Zhou <zhou...@gmail.com> wrote: > > > On Sat, May 16, 2020 at 12:13 PM Girish Moodalbail <gmoodalb...@gmail.com> > wrote: > >> Hello Han, >> >> Can you please explain how the dynamic resolution of the IP-to-MAC will >> work with this new option set? >> >> Say the packet is being forwarded from router2 towards the distributed >> router? So, nexthop (reg0) is set to IP1 and we need to find the MAC >> address M1 to set eth.dst to. >> >> +----------------+ +----------------+ >> | l3gateway | | l3gateway | >> | router2 | | router3 | >> +-------------+--+ +-+--------------+ >> IP2,M2 IP3,M3 >> | | >> +--+-------------+---+ >> | join switch | >> +---------+----------+ >> | >> IP1,M1 >> +-------+--------+ >> | distributed | >> | router | >> +----------------+ >> >> The MAC M1 will not obviously in the MAC_binding table. On the hypervisor >> where the packet originated, the router2's port and the distributed >> router's port are locally present. So, does this result in a PACKET_IN to >> the ovn-controller and the resolution happens there? >> > > Yes there will be a PACKET_IN, and then: > 1. ovn-controller will generate the ARP request for IP1, and send > PACKET_OUT to OVS. > 2. The ARP request will be delivered to the distributed router pipeline > only, because of a special handling of ARP in OVN for IPs of router ports, > although it is a broadcast. (It would have been broadcasted to all GRs > without that special handling) > 3. The distributed router pipeline should learn the IP-MAC binding of > IP2-M2 (through a PACKET_IN to ovn-controller), and at the same time send > ARP reply to the router2 in the distributed router pipeline. > 4. Router2 pipeline will handle the ARP response and learn the IP-MAC > binding of IP1-M1 (through a PACKET_IN to ovn-controller). > > >> >> How about the resolution of IP3-to-M3 happen on gateway router2? Will >> there be an ARP request packet that will be broadcasted on the join switch >> for this case? >> > > I think in the use case of ovn-k8s, as you described before, this should > not happen. However, if this does happen, it is similar to above steps, > except that in step 2) and 3) the ARP request and response will be sent > between the chassises through tunnel. If this happens between all pairs of > GRs, then there will be again O(n^2) MAC_Binding entries. > > I haven't tested the GR scenario yet, so I can't guarantee it works as > expected. Please let me know if you see any problems. I will submit formal > patch with more test cases if it is confirmed in your environment. > > Thanks, > Han > > >> >> Regards, >> ~Girish >> >> On Sat, May 16, 2020 at 10:25 AM Girish Moodalbail <gmoodalb...@gmail.com> >> wrote: >> >>> >>> >>> On Sat, May 16, 2020 at 12:36 AM Han Zhou <zhou...@gmail.com> wrote: >>> >>>> >>>> >>>> On Tue, May 5, 2020 at 11:57 AM Han Zhou <hz...@ovn.org> wrote: >>>> > >>>> > >>>> > >>>> > On Fri, May 1, 2020 at 2:14 PM Dan Winship <danwins...@redhat.com> >>>> wrote: >>>> > > >>>> > > On 5/1/20 12:37 PM, Girish Moodalbail wrote: >>>> > > > If we now look at table=12 (lr_in_arp_resolve) in the ingress >>>> pipeline >>>> > > > of Gateway Router-1, then you will see that there will be 2000 >>>> logical >>>> > > > flow entries... >>>> > > >>>> > > > In the topology above, the only intended path is North-South >>>> between >>>> > > > each gateway router and the logical router. There is no east-west >>>> > > > traffic between the gateway routers >>>> > > >>>> > > > Is there an another way to solve the above problem with just >>>> keeping the >>>> > > > single join logical switch? >>>> > > >>>> > > Two thoughts: >>>> > > >>>> > > 1. In openshift-sdn, the bridge doesn't try to handle ARP itself. It >>>> > > just lets ARP requests pass through normally, and lets ARP replies >>>> pass >>>> > > through normally as long as they are correct (ie, it doesn't let >>>> > > spoofing through). This means fewer flows but more traffic. Maybe >>>> that's >>>> > > the right tradeoff? >>>> > > >>>> > The 2M entries here is not for ARP responder, but more equivalent to >>>> the neighbour table (or ARP cache), on each LR. The ARP responder resides >>>> in the LS (join logical switch), which is O(n) instead of O(n^2), so it is >>>> not a problem here. >>>> > >>>> > However, a similar idea may works here to avoid the O(n^2) scale >>>> issue. For the neighbour table, actually OVN has two parts, one is >>>> statically build, which is the 2M entires mentioned in this case, and the >>>> other is the dynamic ARP resolve - the mac_binding table, which is >>>> dynamically populated by handling ARP messages. To solve the problem here, >>>> it is possible to change OVN to support configuring a LR to avoid static >>>> neighbour table, and relies only on dynamic ARP resolving. In this case, >>>> all the gateway routers can be configured as not using static ARP >>>> resolving, and eventually there will be only 2 entries (one for IPv4 and >>>> one for IPv6) for each gateway router in mac_binding table for the >>>> north-south traffic to the join router. (of source there will be still same >>>> amount of mac_bindings in each router for the external traffic on the other >>>> side of the gateway routers). >>>> > >>>> > This change seems straightforward, but I am not sure if there is any >>>> corner cases. >>>> >>>> Hi Girish, >>>> >>>> I've sent a RFC patch here for the above proposal: >>>> https://patchwork.ozlabs.org/project/openvswitch/patch/1589614395-99499-1-git-send-email-hz...@ovn.org/ >>>> For this use case, just set options:dynamic_neigh_routes=true for all >>>> the Gateway Routers. Could you try it in your scale environment and see if >>>> it solves the problem? >>>> >>>> Thanks, >>>> Han >>>> >>>> > >>>> > > 2. In most places in ovn-kubernetes, our MAC addresses are >>>> > > programmatically related to the corresponding IP addresses, and in >>>> > > places where that's not currently true, we could try to make it >>>> true, >>>> > > and then perhaps the thousands of rules could just be replaced by a >>>> > > single rule? >>>> > > >>>> > This may be a good idea, but I am not sure how to implement in OVN to >>>> make it generic, since most OVN users can't make such assumption. >>>> > >>>> > On the other hand, why wouldn't splitting the join logical switch to >>>> 1000 LSes solve the problem? I understand that there will be 1000 more >>>> datapaths, and 1000 more LRPs, but these are all O(n), which is much more >>>> efficient than the O(n^2) exploding. What's the other scale issues created >>>> by this? >>>> > >>>> > In addition, Girish, for the external LS, I am not sure why can't it >>>> be shared, if all the nodes are connected to a single L2 network. (If they >>>> are connected to separate L2 networks, different external LSes should be >>>> created, at least according to current OVN model). >>>> >>> >>> Thanks Han for the patch. Will give it a try and let you know. >>> >>> Regards, >>> ~Girish >>> >>> >>>> > >>>> > Thanks, >>>> > Han >>>> >>>
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss