On 2018/10/05 18:38, Alexander Bluhm wrote:
> IPv6 Source selection is a mess!
>
> > ICMP6 messages
> > are generated with a source of, I think, the local address associated with
> > the route to the recipient,
>
> It is not that simple. Look at in6_ifawithscope() in sys/netinet6/in6.c.
I know that's used for newly generated packets, but I'm not sure it's the
case for icmp6, I haven't tried modifying the kernel to confirm for sure
that this is the code generating the address in this case, but it seems
likely, in icmp6.c:
1111 /*
1112 * If the incoming packet was addressed directly to us (i.e.
unicast),
1113 * use dst as the src for the reply.
1114 * The IN6_IFF_TENTATIVE|IN6_IFF_DUPLICATED case would be VERY
rare,
1115 * but is possible (for example) when we encounter an error while
1116 * forwarding procedure destined to a duplicated address of ours.
1117 */
1118 rt = rtalloc(sin6tosa(&sa6_dst), 0, rtableid);
1119 if (rtisvalid(rt) && ISSET(rt->rt_flags, RTF_LOCAL) &&
1120 !ISSET(ifatoia6(rt->rt_ifa)->ia6_flags,
1121 IN6_IFF_ANYCAST|IN6_IFF_TENTATIVE|IN6_IFF_DUPLICATED)) {
1122 src = &t;
1123 }
> Could you provide your ifconfig and route output? So we could
> figure out into which path of this algorith you are running.
The host running traceroute has a handful of global scope addresses on
loopback interfaces, plus a global scope address on a vlan interface
facing the next router, all advertised into ospf.
The default source address is one of the loopbacks,
2001:67c:15f4:a423::26, so the only route back from the rest of the
network to this address is via link-locals all the way.
BGP routes changed so I'll include a traceroute using the default source
address again so all the new copied output is consistent:
$ traceroute6 -n www.google.com
traceroute6 to www.google.com (2a00:1450:4009:80b::2004), 64 hops max, 60 byte
packets
1 fe80::5606:33d8:d784:cd2f%vlan701 0.494 ms 0.362 ms 0.373 ms
2 * * *
3 * * *
4 2001:7f8:17::1b1b:1 7.272 ms 7.332 ms 6.938 ms
5 2001:7f8:17::3b41:1 6.699 ms 6.342 ms 6.453 ms
[...]
>From the first hop router,
gr1$ route -n get -inet6 2001:67c:15f4:a423::26
route to: 2001:67c:15f4:a423::26
destination: 2001:67c:15f4:a423::26
mask: ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff
gateway: fe80::e648:4970:e85f:5e13%vlan701
interface: vlan701
if address: fe80::5606:33d8:d784:cd2f%vlan701
priority: 32 (ospf)
flags: <UP,GATEWAY,HOST,DONE>
use mtu expire
29464060 0 0
>From the second hop router,
gr5$ route -n get -inet6 2001:67c:15f4:a423::26
route to: 2001:67c:15f4:a423::26
destination: 2001:67c:15f4:a423::26
mask: ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff
gateway: fe80::d05a:f0e8:e5e:a30a%vlan740
interface: vlan740
if address: fe80::b769:5751:d87b:44b7%vlan740
priority: 32 (ospf)
flags: <UP,GATEWAY,HOST,DONE>
use mtu expire
20369017 0 0
If instead I source packets from the vlan interface (directly connected
to the next router), I instead get this:
$ traceroute6 -n -s 2a03:8920:1:52bd::184 www.google.com
traceroute6 to www.google.com (2a00:1450:4009:80b::2004) from
2a03:8920:1:52bd::184, 64 hops max, 60 byte packets
1 2a03:8920:1:52bd::181 1.769 ms 0.382 ms 0.377 ms
2 * * *
3 * * *
4 2001:7f8:17::1b1b:1 6.931 ms 6.999 ms 7.115 ms
5 2001:7f8:17::3b41:1 6.466 ms 6.568 ms 6.416 ms
The routes in this case from the hop 1 and 2 routers are
gr1$ route -n get -inet6 2a03:8920:1:52bd::184
route to: 2a03:8920:1:52bd::184
destination: 2a03:8920:1:52bd::184
mask: ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff
interface: vlan701
if address: 2a03:8920:1:52bd::181
priority: 3 ()
flags: <UP,HOST,DONE,LLINFO,CLONED>
use mtu expire
22811 0 3
gr5$ route -n get -inet6 2a03:8920:1:52bd::184
route to: 2a03:8920:1:52bd::184
destination: 2a03:8920:1:52bd::
mask: ffff:ffff:ffff:ffff::
gateway: fe80::d05a:f0e8:e5e:a30a%vlan740
interface: vlan740
if address: fe80::b769:5751:d87b:44b7%vlan740
priority: 32 (ospf)
flags: <UP,GATEWAY,DONE>
use mtu expire
234 0 0
So the first hop packet is returned from a "normal" address, and the
second (from tcpdump) is returned from fe80::b769:5751:d87b:44b7 and of
course doesn't make it all the way back to the host running traceroute.
> Once I have have added the following rule from a newer RFC. It
> makes things better for many caes, but not with OSPF6. There you
> may have an interface with only link-local addresses. Then this
> link-local is used instead of another better scoped one.
I have global addresses on all interfaces involved, none of the involved
interfaces just have link-local.
> /* RFC 3484 5. Rule 5: Prefer outgoing interface */
>
> > 4 2001:728:0:5000::55 7.843 ms 8.236 ms 7.391 ms
>
> How can this work? Does your AS-Boundary Router do NAT?
The source address on my traceroutes is a global scope address in all these
cases, so my upstream or peer knows how to route back to that.
> > What's anyone else doing? Just living with it or has anyone figured a way
> > to make it nicer? I'd like to reply with either a global scope address for
> > the interface, or a loopback address,
>
> We have implemented more or less a very old RFC. There are two
> newer RFCs with different algorithms. There is recommendation to
> store policies from user-land into the kernel for address selection.
>
> I have just looked at FreeBSD in6_ifawithifp(), it is quite simple.
> Perhaps this is a way to go.
The code in FreeBSD's icmp6.c matching the above calls in6ifa_ifwithaddr
https://svnweb.freebsd.org/base/head/sys/netinet6/icmp6.c?revision=338831&view=markup#l2113
> > I didn't get anywhere with PF
> > translation though.
>
> pf with IPv6 link-local addresses does not work properly. I think
> it cannot parse the %if suffixes. The KAME hack scope id is not
> handled.
Thank you.