On Tue, May 28, 2019 at 07:09:00PM +0200, Job Snijders wrote:
> On Tue, May 28, 2019 at 05:17:08PM +0200, Claudio Jeker wrote:
> > On Tue, May 28, 2019 at 01:28:32PM +0200, Job Snijders wrote:
> > > On Mon, May 13, 2019 at 09:03:41PM +0200, Claudio Jeker wrote:
> > > > When using a rule forcing the nexthop to a specific address bgpd
> > > > currently does not mark that nexthop as no-modify. In other words
> > > > the default rules for nexthop propagation applies. This means that
> > > > for ebgp it only sends out the set nexthop when this nexthop is 
> > > > connected
> > > > and on the same network as the peer. So while the Adj-RIB-Out shows the
> > > > right nexthop it is actually not on the wire.
> > > > 
> > > > This diff changes set nexthop 198.51.100.42 to also imply set nexthop
> > > > no-modify. This way the set nexthop is always on the wire.
> > > > The problem with that is that it will hand you a nice footgun ready to
> > > > blow of your big toe (but in the end the current behaviour is doing the
> > > > same just with a different angle of attack) .
> > > > 
> > > > The set nexthop section in bgpd.conf.5 needs to be adjusted once a
> > > > decision is made on how to handle this.
> > > 
> > > I think I'm not a big fan of this change.
> > > 
> > > Section 5.1.3 of RFC 4271 (the core BGP spec) is very explicit on
> > > what NEXT_HOPs are valid to send over a non-multihop BGP session.
> > > Only addresses that are part of the linknet between the two routers
> > > are valid NEXT_HOP addresses on the wire. This changes makes it
> > > trivial to send not-valid NEXT_HOPs to a neighbor, this may result
> > > in very hard to debug troublecases. Feels like we'll be handing way
> > > too much rope to users, especially since it facilitates protocol
> > > violations.
> > > 
> > > I am not aware of a real world BGP implementation that would allow
> > > you to send completely arbitrary NEXT_HOPs.
> > 
> > I came to a similar conclusion and will send out a better diff. The
> > idea is that for the non-multihop BGP sessions we should require the
> > nexthop to be in the same network. 
> 
> With 'same network', you mean the NEXT_HOP IP address must be part of
> the router-to-router linknet, right?
> 
> > The ebgp multihop case is currently always sticking to the local
> > address and should probably respect the nexthop if set explicitly
> > (that is also what the RFC suggests). 
> 
> Right.
> 
> > ibgp seems to do the right thing.
> 
> yup, in IBGP the NEXT_HOP can be anything.
> 
> > The same rules need to be applied to "set nexthop no-modify" since
> > that is currently on massive hammer.
> 
> Right.
> 

So here is the diff for this change. In short nexthop no-modify has only
relevance for ebgp multihop links now (for the other cases it is either
the default or ignored). set nexthop self is still overriding all
decisions. ebgp just does the same net check and decides based on that
data. I refactored up_get_nexthop in a way that allows it to be used
for all AFI/SAFI cases and so 3 of the 4 almost identical code versions
did go away and the resulting up_get_nexthop() is a lot nicer now.

I decided to drop the implied NEXTHOP_NOMODIFY on set nexthop 198.51.100.42. 
It makes more sense to set this on a per ebgp multihop session if that is
requested. The reason is that a NEXTHOP_NOMODIFY can not be toggled off
again and so it may cause nasty config hacks in complex setups. People
should just use 'set nexthop no-modify' on the ebgp multihop neighbor
block or in the filters.

Tested with IPv4 and IPv6. OK?
-- 
:wq Claudio

Index: rde_update.c
===================================================================
RCS file: /cvs/src/usr.sbin/bgpd/rde_update.c,v
retrieving revision 1.111
diff -u -p -r1.111 rde_update.c
--- rde_update.c        13 May 2019 21:13:04 -0000      1.111
+++ rde_update.c        29 May 2019 13:07:31 -0000
@@ -238,59 +238,84 @@ up_generate_default(struct filter_head *
 }
 
 /* only for IPv4 */
-static in_addr_t
-up_get_nexthop(struct rde_peer *peer, struct filterstate *state)
+static struct bgpd_addr *
+up_get_nexthop(struct rde_peer *peer, struct filterstate *state, u_int8_t aid)
 {
-       in_addr_t       mask;
+       struct bgpd_addr *peer_local;
 
-       /* nexthop, already network byte order */
-       if (state->nhflags & NEXTHOP_NOMODIFY) {
-               /* no modify flag set */
-               if (state->nexthop == NULL)
-                       return (peer->local_v4_addr.v4.s_addr);
-               else
-                       return (state->nexthop->exit_nexthop.v4.s_addr);
-       } else if (state->nhflags & NEXTHOP_SELF)
-               return (peer->local_v4_addr.v4.s_addr);
-       else if (!peer->conf.ebgp) {
+       switch (aid) {
+       case AID_INET:
+       case AID_VPN_IPv4:
+               peer_local = &peer->local_v4_addr;
+               break;
+       case AID_INET6:
+       case AID_VPN_IPv6:
+               peer_local = &peer->local_v6_addr;
+               break;
+       default:
+               fatalx("%s, bad AID %s", __func__, aid2str(aid));
+       }
+
+       if (state->nhflags & NEXTHOP_SELF) {
                /*
-                * If directly connected use peer->local_v4_addr
-                * this is only true for announced networks.
+                * Forcing the nexthop to self is always possible
+                * and has precedence over other flags.
                 */
-               if (state->nexthop == NULL)
-                       return (peer->local_v4_addr.v4.s_addr);
-               else if (state->nexthop->exit_nexthop.v4.s_addr ==
-                   peer->remote_addr.v4.s_addr)
-                       /*
-                        * per RFC: if remote peer address is equal to
-                        * the nexthop set the nexthop to our local address.
-                        * This reduces the risk of routing loops.
-                        */
-                       return (peer->local_v4_addr.v4.s_addr);
-               else
-                       return (state->nexthop->exit_nexthop.v4.s_addr);
+               return (peer_local);
+       } else if (!peer->conf.ebgp) {
+               /*
+                * in the ibgp case the nexthop is normally not
+                * modified unless it points at the peer itself.
+                */
+               if (state->nexthop == NULL) {
+                       /* announced networks without explicit nexthop set */
+                       return (peer_local);
+               }
+               /*
+                * per RFC: if remote peer address is equal to the nexthop set
+                * the nexthop to our local address. This reduces the risk of
+                * routing loops. This overrides NEXTHOP_NOMODIFY.
+                */
+               if (memcmp(&state->nexthop->exit_nexthop,
+                   &peer->remote_addr, sizeof(peer->remote_addr)) == 0) {
+                       return (peer_local);
+               }
+               return (&state->nexthop->exit_nexthop);
        } else if (peer->conf.distance == 1) {
-               /* ebgp directly connected */
+               /*
+                * In the ebgp directly connected case never send
+                * out a nexthop that is outside of the connected
+                * network of the peer. No matter what flags are
+                * set. This follows section 5.1.3 of RFC 4271.
+                * So just check if the nexthop is in the same net
+                * is enough here.
+                */
                if (state->nexthop != NULL &&
-                   state->nexthop->flags & NEXTHOP_CONNECTED) {
-                       mask = htonl(
-                           prefixlen2mask(state->nexthop->nexthop_netlen));
-                       if ((peer->remote_addr.v4.s_addr & mask) ==
-                           (state->nexthop->nexthop_net.v4.s_addr & mask))
-                               /* nexthop and peer are in the same net */
-                               return (state->nexthop->exit_nexthop.v4.s_addr);
-                       else
-                               return (peer->local_v4_addr.v4.s_addr);
-               } else
-                       return (peer->local_v4_addr.v4.s_addr);
-       } else
-               /* ebgp multihop */
+                   state->nexthop->flags & NEXTHOP_CONNECTED &&
+                   prefix_compare(&peer->remote_addr,
+                   &state->nexthop->nexthop_net,
+                   state->nexthop->nexthop_netlen) == 0) {
+                       /* nexthop and peer are in the same net */
+                       return (&state->nexthop->exit_nexthop);
+               }
+               return (peer_local);
+       } else {
                /*
-                * For ebgp multihop nh->flags should never have
-                * NEXTHOP_CONNECTED set so it should be possible to unify the
-                * two ebgp cases. But this is safe and RFC compliant.
+                * For ebgp multihop make it possible to overrule
+                * the sent nexthop by setting NEXTHOP_NOMODIFY.
+                * Similar to the ibgp case there is no same net check
+                * needed but still ensure that the nexthop is not
+                * pointing to the peer itself.
                 */
-               return (peer->local_v4_addr.v4.s_addr);
+               if (state->nhflags & NEXTHOP_NOMODIFY &&
+                   state->nexthop == NULL &&
+                   memcmp(&state->nexthop->exit_nexthop,
+                   &peer->remote_addr, sizeof(peer->remote_addr)) != 0) {
+                       /* no modify flag set and nexthop not peer addr */
+                       return (&state->nexthop->exit_nexthop);
+               }
+               return (peer_local);
+       }
 }
 
 static int
@@ -350,7 +375,8 @@ up_generate_attr(u_char *buf, int len, s
                case ATTR_NEXTHOP:
                        switch (aid) {
                        case AID_INET:
-                               nexthop = up_get_nexthop(peer, state);
+                               nexthop =
+                                   up_get_nexthop(peer, state, aid)->v4.s_addr;
                                if ((r = attr_write(buf + wlen, len,
                                    ATTR_WELL_KNOWN, ATTR_NEXTHOP, &nexthop,
                                    4)) == -1)
@@ -725,10 +751,10 @@ static int
 up_generate_mp_reach(u_char *buf, int len, struct rde_peer *peer,
     struct filterstate *state, u_int8_t aid)
 {
-       u_char          *attrbuf;
-       int              r;
-       int              wpos, attrlen;
-       u_int16_t        tmp;
+       struct bgpd_addr        *nexthop;
+       u_char                  *attrbuf;
+       int                      r, wpos, attrlen;
+       u_int16_t                tmp;
 
        if (len < 4)
                return (-1);
@@ -751,54 +777,10 @@ up_generate_mp_reach(u_char *buf, int le
                attrbuf[3] = sizeof(struct in6_addr);
                attrbuf[20] = 0; /* Reserved must be 0 */
 
-               /* nexthop dance see also up_get_nexthop() */
+               /* write nexthop */
                attrbuf += 4;
-               if (state->nhflags & NEXTHOP_NOMODIFY) {
-                       /* no modify flag set */
-                       if (state->nexthop == NULL)
-                               memcpy(attrbuf, &peer->local_v6_addr.v6,
-                                   sizeof(struct in6_addr));
-                       else
-                               memcpy(attrbuf,
-                                   &state->nexthop->exit_nexthop.v6,
-                                   sizeof(struct in6_addr));
-               } else if (state->nhflags & NEXTHOP_SELF)
-                       memcpy(attrbuf, &peer->local_v6_addr.v6,
-                           sizeof(struct in6_addr));
-               else if (!peer->conf.ebgp) {
-                       /* ibgp */
-                       if (state->nexthop == NULL ||
-                           (state->nexthop->exit_nexthop.aid == AID_INET6 &&
-                           !memcmp(&state->nexthop->exit_nexthop.v6,
-                           &peer->remote_addr.v6, sizeof(struct in6_addr))))
-                               memcpy(attrbuf, &peer->local_v6_addr.v6,
-                                   sizeof(struct in6_addr));
-                       else
-                               memcpy(attrbuf,
-                                   &state->nexthop->exit_nexthop.v6,
-                                   sizeof(struct in6_addr));
-               } else if (peer->conf.distance == 1) {
-                       /* ebgp directly connected */
-                       if (state->nexthop != NULL &&
-                           state->nexthop->flags & NEXTHOP_CONNECTED)
-                               if (prefix_compare(&peer->remote_addr,
-                                   &state->nexthop->nexthop_net,
-                                   state->nexthop->nexthop_netlen) == 0) {
-                                       /*
-                                        * nexthop and peer are in the same
-                                        * subnet
-                                        */
-                                       memcpy(attrbuf,
-                                           &state->nexthop->exit_nexthop.v6,
-                                           sizeof(struct in6_addr));
-                                       break;
-                               }
-                       memcpy(attrbuf, &peer->local_v6_addr.v6,
-                           sizeof(struct in6_addr));
-               } else
-                       /* ebgp multihop */
-                       memcpy(attrbuf, &peer->local_v6_addr.v6,
-                           sizeof(struct in6_addr));
+               nexthop = up_get_nexthop(peer, state, aid);
+               memcpy(attrbuf, &nexthop->v6, sizeof(struct in6_addr));
                break;
        case AID_VPN_IPv4:
                attrlen = 17; /* AFI + SAFI + NH LEN + NH + Reserved */
@@ -813,55 +795,10 @@ up_generate_mp_reach(u_char *buf, int le
                bzero(attrbuf + 4, sizeof(u_int64_t));
                attrbuf[16] = 0; /* Reserved must be 0 */
 
-               /* nexthop dance see also up_get_nexthop() */
+               /* write nexthop */
                attrbuf += 12;
-               if (state->nhflags & NEXTHOP_NOMODIFY) {
-                       /* no modify flag set */
-                       if (state->nexthop == NULL)
-                               memcpy(attrbuf, &peer->local_v4_addr.v4,
-                                   sizeof(struct in_addr));
-                       else
-                               /* nexthops are stored as IPv4 addrs */
-                               memcpy(attrbuf,
-                                   &state->nexthop->exit_nexthop.v4,
-                                   sizeof(struct in_addr));
-               } else if (state->nhflags & NEXTHOP_SELF) {
-                       memcpy(attrbuf, &peer->local_v4_addr.v4,
-                           sizeof(struct in_addr));
-               } else if (!peer->conf.ebgp) {
-                       /* ibgp */
-                       if (state->nexthop == NULL ||
-                           (state->nexthop->exit_nexthop.aid == AID_INET &&
-                           !memcmp(&state->nexthop->exit_nexthop.v4,
-                           &peer->remote_addr.v4, sizeof(struct in_addr))))
-                               memcpy(attrbuf, &peer->local_v4_addr.v4,
-                                   sizeof(struct in_addr));
-                       else
-                               memcpy(attrbuf,
-                                   &state->nexthop->exit_nexthop.v4,
-                                   sizeof(struct in_addr));
-               } else if (peer->conf.distance == 1) {
-                       /* ebgp directly connected */
-                       if (state->nexthop != NULL &&
-                           state->nexthop->flags & NEXTHOP_CONNECTED)
-                               if (prefix_compare(&peer->remote_addr,
-                                   &state->nexthop->nexthop_net,
-                                   state->nexthop->nexthop_netlen) == 0) {
-                                       /*
-                                        * nexthop and peer are in the same
-                                        * subnet
-                                        */
-                                       memcpy(attrbuf,
-                                           &state->nexthop->exit_nexthop.v4,
-                                           sizeof(struct in_addr));
-                                       break;
-                               }
-                       memcpy(attrbuf, &peer->local_v4_addr.v4,
-                           sizeof(struct in_addr));
-               } else
-                       /* ebgp multihop */
-                       memcpy(attrbuf, &peer->local_v4_addr.v4,
-                           sizeof(struct in_addr));
+               nexthop = up_get_nexthop(peer, state, aid);
+               memcpy(attrbuf, &nexthop->v4, sizeof(struct in_addr));
                break;
        case AID_VPN_IPv6:
                attrlen = 29; /* AFI + SAFI + NH LEN + NH + Reserved */
@@ -876,54 +813,10 @@ up_generate_mp_reach(u_char *buf, int le
                bzero(attrbuf + 4, sizeof(u_int64_t));
                attrbuf[28] = 0; /* Reserved must be 0 */
 
-               /* nexthop dance see also up_get_nexthop() */
+               /* write nexthop */
                attrbuf += 12;
-               if (state->nhflags & NEXTHOP_NOMODIFY) {
-                       /* no modify flag set */
-                       if (state->nexthop == NULL)
-                               memcpy(attrbuf, &peer->local_v6_addr.v6,
-                                   sizeof(struct in6_addr));
-                       else
-                               memcpy(attrbuf,
-                                   &state->nexthop->exit_nexthop.v6,
-                                   sizeof(struct in6_addr));
-               } else if (state->nhflags & NEXTHOP_SELF)
-                       memcpy(attrbuf, &peer->local_v6_addr.v6,
-                           sizeof(struct in6_addr));
-               else if (!peer->conf.ebgp) {
-                       /* ibgp */
-                       if (state->nexthop == NULL ||
-                           (state->nexthop->exit_nexthop.aid == AID_INET6 &&
-                           !memcmp(&state->nexthop->exit_nexthop.v6,
-                           &peer->remote_addr.v6, sizeof(struct in6_addr))))
-                               memcpy(attrbuf, &peer->local_v6_addr.v6,
-                                   sizeof(struct in6_addr));
-                       else
-                               memcpy(attrbuf,
-                                   &state->nexthop->exit_nexthop.v6,
-                                   sizeof(struct in6_addr));
-               } else if (peer->conf.distance == 1) {
-                       /* ebgp directly connected */
-                       if (state->nexthop != NULL &&
-                           state->nexthop->flags & NEXTHOP_CONNECTED)
-                               if (prefix_compare(&peer->remote_addr,
-                                   &state->nexthop->nexthop_net,
-                                   state->nexthop->nexthop_netlen) == 0) {
-                                       /*
-                                       * nexthop and peer are in the same
-                                       * subnet
-                                       */
-                                       memcpy(attrbuf,
-                                           &state->nexthop->exit_nexthop.v6,
-                                           sizeof(struct in6_addr));
-                                       break;
-                               }
-                       memcpy(attrbuf, &peer->local_v6_addr.v6,
-                           sizeof(struct in6_addr));
-               } else
-                       /* ebgp multihop */
-                       memcpy(attrbuf, &peer->local_v6_addr.v6,
-                           sizeof(struct in6_addr));
+               nexthop = up_get_nexthop(peer, state, aid);
+               memcpy(attrbuf, &nexthop->v6, sizeof(struct in6_addr));
                break;
        default:
                fatalx("up_generate_mp_reach: unknown AID");

Reply via email to