Public bug reported:

Hi,

NOTE: I was unsure how to report this bug as I found suggestions that I
should report it against the distro and another was to use the mailing
list. (Also see: https://marc.info/?l=linux-
netdev&m=171953240705042&w=2)

This appears to be a bug in Linux kernel networking. This was observed
on a fresh install of Ubuntu 24.04, with Linux 6.8.0-36-generic.

PROBLEM
In the network diagram below, I have two default routers (TR1 and TR2). The HUT 
has two neighbor cache entries: TR1=REACHABLE and TR2=INCOMPLETE.  When I ping 
the host (HUT) from a remote test node (TN2) via TR1, the HUT sends a NS for 
TR2 when it should have replied directly via TR1.  This breaks communication 
and violates IPv6 Logo compliance.

            TN2
             |
    +--------+--------+
    |                 |
   TR1               TR2
(REACHABLE)      (INCOMPLETE)
    |                 |
    +--------+--------+
             |
            HUT

The RFC for Neighbor Discovery describes the policy for selecting routes
from the Default Router List. The relevant bullet is extracted below…

RFC4861 6.3.6. Default Router Selection
 The policy for selecting routers from the Default Router List is as
 follows:

 1) Routers that are reachable or probably reachable (i.e., in any
    state other than INCOMPLETE) SHOULD be preferred over routers
    whose reachability is unknown or suspect (i.e., in the
    INCOMPLETE state, or for which no Neighbor Cache entry exists).
    Further implementation hints on default router selection when
    multiple equivalent routers are available are discussed in
    [[LD-SHRE](https://datatracker.ietf.org/doc/html/rfc4861#ref-LD-SHRE)].

REPRODUCER
This condition is created by configuring two routers under systemd-networkd, 
either by having each router send an RA, or statically configuring one router 
at a time. I show the steps for the static configuration below.

Assuming you have an interface named “enp0s9” and you’re using systemd-
networkd as the network manager:

1.      Configure the Host (HUT) with one router (TR1)
$ networkctl cat 10-enp0s9.network
# /etc/systemd/network/10-enp0s9.network
[Match]
Name=enp0s9

[Link]
RequiredForOnline=no

[Network]
Description="Internal Network: Private VM-to-VM IPv6 interface"
DHCP=no
LLDP=no
EmitLLDP=no


# /etc/systemd/network/10-enp0s9.network.d/address.conf
[Network]
Address=2001:2:0:1000:a00:27ff:fe5f:f72d/64


# /etc/systemd/network/10-enp0s9.network.d/route-1060.conf
[Route]
Gateway=fe80::200:10ff:fe10:1060
GatewayOnLink=true

2.      Start or reload the configuration
$ sudo networkctl reload
$ sudo networkctl reconfigure enp0s9
$ ip -6 r
2001:2:0:1000::/64 dev enp0s9 proto kernel metric 256 pref medium
fe80::/64 dev enp0s3 proto kernel metric 256 pref medium
fe80::/64 dev enp0s9 proto kernel metric 256 pref medium
default via fe80::200:10ff:fe10:1060 dev enp0s9 proto static metric 1024 onlink 
pref medium

3.      Flush and Monitor the neighbor cache
$ sudo ip -6 neigh flush all; ip -6 -ts monitor neigh

4.      From TN1, ping HUT via TR1 – the HUT’s NCE is updated to REACHABLE
[2024-06-28T08:13:27.617674] fe80::200:10ff:fe10:1060 dev enp0s9 lladdr 
00:00:10:10:10:60 router REACHABLE

NOTE: tcpdump shows the expected protocol exchange.

5.      Configure the Host (HUT) with a 2nd router (TR2)
$ cat /etc/systemd/network/10-enp0s9.network.d/route-1061.conf 
[Route]
Gateway=fe80::200:10ff:fe10:1061
GatewayOnLink=true
$ sudo networkctl reload
$ sudo networkctl reconfigure enp0s9
$ ip -6 r
2001:2:0:1000::/64 dev enp0s9 proto kernel metric 256 pref medium
fe80::/64 dev enp0s3 proto kernel metric 256 pref medium
fe80::/64 dev enp0s9 proto kernel metric 256 pref medium
default proto static metric 1024 pref medium
     nexthop via fe80::200:10ff:fe10:1061 dev enp0s9 weight 1 
     nexthop via fe80::200:10ff:fe10:1060 dev enp0s9 weight 1

6.      Start monitoring traffic with tcpdump/WireShark

7.      From TN1, ping HUT via TR1
a.      An echo reply is never received
b.      The protocol exchange shows the HUT sends a NS for TR2 (which is NOT 
REACHABLE) when it should have sent an echo-reply via TR1 (which is REACHABLE).

OBSERVATIONS
1.      When NOT using systemd-network and each router sends an RA, the kernel 
behaves correctly.
2.      The routing table looks different, depending on whether the kernel adds 
the route or systemd-networkd adds the route. E.g.
a.      Kernel adds two separate “default route” entries (systemd-networkd is 
stopped)
$ ip -6 route
<deleted lines>
default via fe80::200:10ff:fe10:1060 proto ra metric 1024 expires 39sec 
hoplimit 64 pref medium
default via fe80::200:10ff:fe10:1061 proto ra metric 1024 expires 44sec 
hoplimit 64 pref medium
b.      Systemd-networkd adds one “default route” with two nexthop options 
(systemd-networkd is running)
$ ip -6 route
<deleted lines>
default proto ra metric 1024 expires 589sec pref medium
 nexthop via fe80::200:10ff:fe10:1060 dev enp0s9 weight 1
 nexthop via fe80::200:10ff:fe10:1061 dev enp0s9 weight 1
TCPDUMP
For completeness, here is the annotated output from tcpdump…

$ tcpdump -r ~/v6LC_2_2_11-bug-report-summary.pcapng -t -n --number -e
reading from file /home/matt/v6LC_2_2_11-bug-report-summary.pcapng, link-type 
EN10MB (Ethernet), snapshot length 262144

    # Step 4:  TN1(1181) pings HUT(f72d) via TR1(1060)
    1  00:00:10:10:10:60 > 08:00:27:5f:f7:2d, ethertype IPv6 (0x86dd), length 
70: 2001:2:0:1001:200:10ff:fe10:1181 > 2001:2:0:1000:a00:27ff:fe5f:f72d: ICMP6, 
echo request, id 0, seq 0, length 16
    2  08:00:27:5f:f7:2d > 33:33:ff:10:10:60, ethertype IPv6 (0x86dd), length 
86: 2001:2:0:1000:a00:27ff:fe5f:f72d > ff02::1:ff10:1060: ICMP6, neighbor 
solicitation, who has fe80::200:10ff:fe10:1060, length 32
    3  00:00:10:10:10:60 > 08:00:27:5f:f7:2d, ethertype IPv6 (0x86dd), length 
86: fe80::200:10ff:fe10:1060 > fe80::a00:27ff:fe5f:f72d: ICMP6, neighbor 
advertisement, tgt is fe80::200:10ff:fe10:1060, length 32
    4  08:00:27:5f:f7:2d > 00:00:10:10:10:60, ethertype IPv6 (0x86dd), length 
70: 2001:2:0:1000:a00:27ff:fe5f:f72d > 2001:2:0:1001:200:10ff:fe10:1181: ICMP6, 
echo reply, id 0, seq 0, length 16

    # HUT has replied to TN1 via TR1.  NCE for TR1=REACHABLE

    # Step 5: Now configure TR2 
    # Step 7:   TN1(1181) pings HUT(f72d) via TR1(1060)
    5  00:00:10:10:10:60 > 08:00:27:5f:f7:2d, ethertype IPv6 (0x86dd), length 
70: 2001:2:0:1001:200:10ff:fe10:1181 > 2001:2:0:1000:a00:27ff:fe5f:f72d: ICMP6, 
echo request, id 0, seq 0, length 16

    # HUT creates an NCE for TR2=INCOMPLETE

    # HUT incorrectly sends NS for TR2(1061) when it should have sent 
echo-reply via TR1(1060)
    6  08:00:27:5f:f7:2d > 33:33:ff:10:10:61, ethertype IPv6 (0x86dd), length 
86: 2001:2:0:1000:a00:27ff:fe5f:f72d > ff02::1:ff10:1061: ICMP6, neighbor 
solicitation, who has fe80::200:10ff:fe10:1061, length 32
    7  08:00:27:5f:f7:2d > 33:33:ff:10:10:61, ethertype IPv6 (0x86dd), length 
86: 2001:2:0:1000:a00:27ff:fe5f:f72d > ff02::1:ff10:1061: ICMP6, neighbor 
solicitation, who has fe80::200:10ff:fe10:1061, length 32
    8  08:00:27:5f:f7:2d > 33:33:ff:10:10:61, ethertype IPv6 (0x86dd), length 
86: 2001:2:0:1000:a00:27ff:fe5f:f72d > ff02::1:ff10:1061: ICMP6, neighbor 
solicitation, who has fe80::200:10ff:fe10:1061, length 32

Regards,
Matt.

ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: linux-image-6.8.0-36-generic 6.8.0-36.36
ProcVersionSignature: Ubuntu 6.8.0-36.36-generic 6.8.4
Uname: Linux 6.8.0-36-generic x86_64
ApportVersion: 2.28.1-0ubuntu3
Architecture: amd64
AudioDevicesInUse:
 USER        PID ACCESS COMMAND
 /dev/snd/seq:        matt       2599 F.... pipewire
 /dev/snd/controlC0:  matt       2603 F.... wireplumber
CRDA: N/A
CasperMD5CheckResult: pass
CurrentDesktop: ubuntu:GNOME
Date: Fri Jun 28 10:52:11 2024
InstallationDate: Installed on 2024-06-24 (4 days ago)
InstallationMedia: Ubuntu 24.04 LTS "Noble Numbat" - Release amd64 (20240424)
Lsusb:
 Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 002: ID 80ee:0021 VirtualBox USB Tablet
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Lsusb-t:
 /:  Bus 001.Port 001: Dev 001, Class=root_hub, Driver=ohci-pci/12p, 12M
     |__ Port 001: Dev 002, If 0, Class=Human Interface Device, Driver=usbhid, 
12M
 /:  Bus 002.Port 001: Dev 001, Class=root_hub, Driver=ehci-pci/12p, 480M
MachineType: innotek GmbH VirtualBox
ProcEnviron:
 LANG=en_US.UTF-8
 PATH=(custom, no user)
 SHELL=/bin/bash
 TERM=xterm-256color
 XDG_RUNTIME_DIR=<set>
ProcFB: 0 vmwgfxdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.8.0-36-generic 
root=UUID=d3096757-b767-4cf4-8b9c-c65a87bd4f4e ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-6.8.0-36-generic N/A
 linux-backports-modules-6.8.0-36-generic  N/A
 linux-firmware                            20240318.git3b128b60-0ubuntu2.1
RfKill:
 
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 12/01/2006
dmi.bios.vendor: innotek GmbH
dmi.bios.version: VirtualBox
dmi.board.name: VirtualBox
dmi.board.vendor: Oracle Corporation
dmi.board.version: 1.2
dmi.chassis.type: 1
dmi.chassis.vendor: Oracle Corporation
dmi.modalias: 
dmi:bvninnotekGmbH:bvrVirtualBox:bd12/01/2006:svninnotekGmbH:pnVirtualBox:pvr1.2:rvnOracleCorporation:rnVirtualBox:rvr1.2:cvnOracleCorporation:ct1:cvr:sku:
dmi.product.family: Virtual Machine
dmi.product.name: VirtualBox
dmi.product.version: 1.2
dmi.sys.vendor: innotek GmbH

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 apport-bug noble wayland-session

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2071397

Title:
  Wrong nexthop selection with two default routers where only one is
  REACHABLE

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2071397/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to