Public bug reported: Hi,
NOTE: I was unsure how to report this bug as I found suggestions that I should report it against the distro and another was to use the mailing list. (Also see: https://marc.info/?l=linux- netdev&m=171953240705042&w=2) This appears to be a bug in Linux kernel networking. This was observed on a fresh install of Ubuntu 24.04, with Linux 6.8.0-36-generic. PROBLEM In the network diagram below, I have two default routers (TR1 and TR2). The HUT has two neighbor cache entries: TR1=REACHABLE and TR2=INCOMPLETE. When I ping the host (HUT) from a remote test node (TN2) via TR1, the HUT sends a NS for TR2 when it should have replied directly via TR1. This breaks communication and violates IPv6 Logo compliance. TN2 | +--------+--------+ | | TR1 TR2 (REACHABLE) (INCOMPLETE) | | +--------+--------+ | HUT The RFC for Neighbor Discovery describes the policy for selecting routes from the Default Router List. The relevant bullet is extracted below… RFC4861 6.3.6. Default Router Selection The policy for selecting routers from the Default Router List is as follows: 1) Routers that are reachable or probably reachable (i.e., in any state other than INCOMPLETE) SHOULD be preferred over routers whose reachability is unknown or suspect (i.e., in the INCOMPLETE state, or for which no Neighbor Cache entry exists). Further implementation hints on default router selection when multiple equivalent routers are available are discussed in [[LD-SHRE](https://datatracker.ietf.org/doc/html/rfc4861#ref-LD-SHRE)]. REPRODUCER This condition is created by configuring two routers under systemd-networkd, either by having each router send an RA, or statically configuring one router at a time. I show the steps for the static configuration below. Assuming you have an interface named “enp0s9” and you’re using systemd- networkd as the network manager: 1. Configure the Host (HUT) with one router (TR1) $ networkctl cat 10-enp0s9.network # /etc/systemd/network/10-enp0s9.network [Match] Name=enp0s9 [Link] RequiredForOnline=no [Network] Description="Internal Network: Private VM-to-VM IPv6 interface" DHCP=no LLDP=no EmitLLDP=no # /etc/systemd/network/10-enp0s9.network.d/address.conf [Network] Address=2001:2:0:1000:a00:27ff:fe5f:f72d/64 # /etc/systemd/network/10-enp0s9.network.d/route-1060.conf [Route] Gateway=fe80::200:10ff:fe10:1060 GatewayOnLink=true 2. Start or reload the configuration $ sudo networkctl reload $ sudo networkctl reconfigure enp0s9 $ ip -6 r 2001:2:0:1000::/64 dev enp0s9 proto kernel metric 256 pref medium fe80::/64 dev enp0s3 proto kernel metric 256 pref medium fe80::/64 dev enp0s9 proto kernel metric 256 pref medium default via fe80::200:10ff:fe10:1060 dev enp0s9 proto static metric 1024 onlink pref medium 3. Flush and Monitor the neighbor cache $ sudo ip -6 neigh flush all; ip -6 -ts monitor neigh 4. From TN1, ping HUT via TR1 – the HUT’s NCE is updated to REACHABLE [2024-06-28T08:13:27.617674] fe80::200:10ff:fe10:1060 dev enp0s9 lladdr 00:00:10:10:10:60 router REACHABLE NOTE: tcpdump shows the expected protocol exchange. 5. Configure the Host (HUT) with a 2nd router (TR2) $ cat /etc/systemd/network/10-enp0s9.network.d/route-1061.conf [Route] Gateway=fe80::200:10ff:fe10:1061 GatewayOnLink=true $ sudo networkctl reload $ sudo networkctl reconfigure enp0s9 $ ip -6 r 2001:2:0:1000::/64 dev enp0s9 proto kernel metric 256 pref medium fe80::/64 dev enp0s3 proto kernel metric 256 pref medium fe80::/64 dev enp0s9 proto kernel metric 256 pref medium default proto static metric 1024 pref medium nexthop via fe80::200:10ff:fe10:1061 dev enp0s9 weight 1 nexthop via fe80::200:10ff:fe10:1060 dev enp0s9 weight 1 6. Start monitoring traffic with tcpdump/WireShark 7. From TN1, ping HUT via TR1 a. An echo reply is never received b. The protocol exchange shows the HUT sends a NS for TR2 (which is NOT REACHABLE) when it should have sent an echo-reply via TR1 (which is REACHABLE). OBSERVATIONS 1. When NOT using systemd-network and each router sends an RA, the kernel behaves correctly. 2. The routing table looks different, depending on whether the kernel adds the route or systemd-networkd adds the route. E.g. a. Kernel adds two separate “default route” entries (systemd-networkd is stopped) $ ip -6 route <deleted lines> default via fe80::200:10ff:fe10:1060 proto ra metric 1024 expires 39sec hoplimit 64 pref medium default via fe80::200:10ff:fe10:1061 proto ra metric 1024 expires 44sec hoplimit 64 pref medium b. Systemd-networkd adds one “default route” with two nexthop options (systemd-networkd is running) $ ip -6 route <deleted lines> default proto ra metric 1024 expires 589sec pref medium nexthop via fe80::200:10ff:fe10:1060 dev enp0s9 weight 1 nexthop via fe80::200:10ff:fe10:1061 dev enp0s9 weight 1 TCPDUMP For completeness, here is the annotated output from tcpdump… $ tcpdump -r ~/v6LC_2_2_11-bug-report-summary.pcapng -t -n --number -e reading from file /home/matt/v6LC_2_2_11-bug-report-summary.pcapng, link-type EN10MB (Ethernet), snapshot length 262144 # Step 4: TN1(1181) pings HUT(f72d) via TR1(1060) 1 00:00:10:10:10:60 > 08:00:27:5f:f7:2d, ethertype IPv6 (0x86dd), length 70: 2001:2:0:1001:200:10ff:fe10:1181 > 2001:2:0:1000:a00:27ff:fe5f:f72d: ICMP6, echo request, id 0, seq 0, length 16 2 08:00:27:5f:f7:2d > 33:33:ff:10:10:60, ethertype IPv6 (0x86dd), length 86: 2001:2:0:1000:a00:27ff:fe5f:f72d > ff02::1:ff10:1060: ICMP6, neighbor solicitation, who has fe80::200:10ff:fe10:1060, length 32 3 00:00:10:10:10:60 > 08:00:27:5f:f7:2d, ethertype IPv6 (0x86dd), length 86: fe80::200:10ff:fe10:1060 > fe80::a00:27ff:fe5f:f72d: ICMP6, neighbor advertisement, tgt is fe80::200:10ff:fe10:1060, length 32 4 08:00:27:5f:f7:2d > 00:00:10:10:10:60, ethertype IPv6 (0x86dd), length 70: 2001:2:0:1000:a00:27ff:fe5f:f72d > 2001:2:0:1001:200:10ff:fe10:1181: ICMP6, echo reply, id 0, seq 0, length 16 # HUT has replied to TN1 via TR1. NCE for TR1=REACHABLE # Step 5: Now configure TR2 # Step 7: TN1(1181) pings HUT(f72d) via TR1(1060) 5 00:00:10:10:10:60 > 08:00:27:5f:f7:2d, ethertype IPv6 (0x86dd), length 70: 2001:2:0:1001:200:10ff:fe10:1181 > 2001:2:0:1000:a00:27ff:fe5f:f72d: ICMP6, echo request, id 0, seq 0, length 16 # HUT creates an NCE for TR2=INCOMPLETE # HUT incorrectly sends NS for TR2(1061) when it should have sent echo-reply via TR1(1060) 6 08:00:27:5f:f7:2d > 33:33:ff:10:10:61, ethertype IPv6 (0x86dd), length 86: 2001:2:0:1000:a00:27ff:fe5f:f72d > ff02::1:ff10:1061: ICMP6, neighbor solicitation, who has fe80::200:10ff:fe10:1061, length 32 7 08:00:27:5f:f7:2d > 33:33:ff:10:10:61, ethertype IPv6 (0x86dd), length 86: 2001:2:0:1000:a00:27ff:fe5f:f72d > ff02::1:ff10:1061: ICMP6, neighbor solicitation, who has fe80::200:10ff:fe10:1061, length 32 8 08:00:27:5f:f7:2d > 33:33:ff:10:10:61, ethertype IPv6 (0x86dd), length 86: 2001:2:0:1000:a00:27ff:fe5f:f72d > ff02::1:ff10:1061: ICMP6, neighbor solicitation, who has fe80::200:10ff:fe10:1061, length 32 Regards, Matt. ProblemType: Bug DistroRelease: Ubuntu 24.04 Package: linux-image-6.8.0-36-generic 6.8.0-36.36 ProcVersionSignature: Ubuntu 6.8.0-36.36-generic 6.8.4 Uname: Linux 6.8.0-36-generic x86_64 ApportVersion: 2.28.1-0ubuntu3 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/seq: matt 2599 F.... pipewire /dev/snd/controlC0: matt 2603 F.... wireplumber CRDA: N/A CasperMD5CheckResult: pass CurrentDesktop: ubuntu:GNOME Date: Fri Jun 28 10:52:11 2024 InstallationDate: Installed on 2024-06-24 (4 days ago) InstallationMedia: Ubuntu 24.04 LTS "Noble Numbat" - Release amd64 (20240424) Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 001 Device 002: ID 80ee:0021 VirtualBox USB Tablet Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Lsusb-t: /: Bus 001.Port 001: Dev 001, Class=root_hub, Driver=ohci-pci/12p, 12M |__ Port 001: Dev 002, If 0, Class=Human Interface Device, Driver=usbhid, 12M /: Bus 002.Port 001: Dev 001, Class=root_hub, Driver=ehci-pci/12p, 480M MachineType: innotek GmbH VirtualBox ProcEnviron: LANG=en_US.UTF-8 PATH=(custom, no user) SHELL=/bin/bash TERM=xterm-256color XDG_RUNTIME_DIR=<set> ProcFB: 0 vmwgfxdrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.8.0-36-generic root=UUID=d3096757-b767-4cf4-8b9c-c65a87bd4f4e ro quiet splash vt.handoff=7 RelatedPackageVersions: linux-restricted-modules-6.8.0-36-generic N/A linux-backports-modules-6.8.0-36-generic N/A linux-firmware 20240318.git3b128b60-0ubuntu2.1 RfKill: SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 12/01/2006 dmi.bios.vendor: innotek GmbH dmi.bios.version: VirtualBox dmi.board.name: VirtualBox dmi.board.vendor: Oracle Corporation dmi.board.version: 1.2 dmi.chassis.type: 1 dmi.chassis.vendor: Oracle Corporation dmi.modalias: dmi:bvninnotekGmbH:bvrVirtualBox:bd12/01/2006:svninnotekGmbH:pnVirtualBox:pvr1.2:rvnOracleCorporation:rnVirtualBox:rvr1.2:cvnOracleCorporation:ct1:cvr:sku: dmi.product.family: Virtual Machine dmi.product.name: VirtualBox dmi.product.version: 1.2 dmi.sys.vendor: innotek GmbH ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Tags: amd64 apport-bug noble wayland-session -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2071397 Title: Wrong nexthop selection with two default routers where only one is REACHABLE To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2071397/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs