The scripts for configured.d and configuring.d to add and remove IP rules (included above) are likely the culprit. @ddstreet would you like me to write that up more compactly?
-- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1881972 Title: systemd-networkd crashes with invalid pointer Status in systemd package in Ubuntu: Fix Released Status in systemd source package in Bionic: Incomplete Bug description: [impact] systemd-networkd double-free causes crash under some circumstances, such as adding/removing ip rules [test case] see original description [regression potential] this strdup's strings during addition of routing policy rules, so any regression would likely occur when adding/modifying/removing ip rules, possibly including networkd segfault or failure to add/remove/modify ip rules. [scope] this is needed for bionic. this is fixed by upstream commit eeab051b28ba6e1b4a56d369d4c6bf7cfa71947c which is included starting in v240, so this is already included in Focal and later. I did not research what original commit introduced the problem, but the reporter indicates this did not happen for Xenial so it's unlikely this is a problem in Xenial or earlier. [original description] This is a serious regression with systemd-networkd that I ran in to while setting up a NAT router in AWS. The AWS AMI ubuntu/images/hvm- ssd/ubuntu-bionic-18.04-amd64-server-20200131 with systemd-237-3ubuntu10.33 does NOT have the problem, but the next most recent AWS AMI ubuntu/images/hvm-ssd/ubuntu- bionic-18.04-amd64-server-20200311 with systemd-including 237-3ubuntu10.39 does. Also, a system booted from the (good) 20200131 AMI starts showing the problem after updating only systemd (to 237-3ubuntu10.41) and its direct dependencies (e.g. 'apt-get install systemd'). So I'm fairly confident that a change to the systemd package between 237-3ubuntu10.33 and 237-3ubuntu10.39 introduced the problem and it is still present. On the NAT router I use three interfaces and have separate routing tables for admin and forwarded traffic. Things come up fine initially but every 30-60 minutes (DHCP lease renewal time?) one or more interfaces is reconfigured and most of the time systemd-networkd will crash and need to be restarted. Eventually the system becomes unreachable when the default crash loop backoff logic prevents the network service from being restarted at all. The log excerpt attached illustrates the crash loop. Also including the netplan and networkd config files below. # grep . /etc/netplan/* /etc/netplan/50-cloud-init.yaml:# This file is generated from information provided by the datasource. Changes /etc/netplan/50-cloud-init.yaml:# to it will not persist across an instance reboot. To disable cloud-init's /etc/netplan/50-cloud-init.yaml:# network configuration capabilities, write a file /etc/netplan/50-cloud-init.yaml:# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following: /etc/netplan/50-cloud-init.yaml:# network: {config: disabled} /etc/netplan/50-cloud-init.yaml:network: /etc/netplan/50-cloud-init.yaml: version: 2 /etc/netplan/50-cloud-init.yaml: ethernets: /etc/netplan/50-cloud-init.yaml: ens5: /etc/netplan/50-cloud-init.yaml: dhcp4: true /etc/netplan/50-cloud-init.yaml: match: /etc/netplan/50-cloud-init.yaml: macaddress: xx:xx:xx:xx:xx:xx /etc/netplan/50-cloud-init.yaml: set-name: ens5 /etc/netplan/99_config.yaml:network: /etc/netplan/99_config.yaml: version: 2 /etc/netplan/99_config.yaml: renderer: networkd /etc/netplan/99_config.yaml: ethernets: /etc/netplan/99_config.yaml: ens6: /etc/netplan/99_config.yaml: match: /etc/netplan/99_config.yaml: macaddress: yy:yy:yy:yy:yy:yy /etc/netplan/99_config.yaml: dhcp4: true /etc/netplan/99_config.yaml: dhcp4-overrides: /etc/netplan/99_config.yaml: use-routes: false /etc/netplan/99_config.yaml: ens7: /etc/netplan/99_config.yaml: match: /etc/netplan/99_config.yaml: macaddress: zz:zz:zz:zz:zz:zz /etc/netplan/99_config.yaml: mtu: 1500 /etc/netplan/99_config.yaml: dhcp4: true /etc/netplan/99_config.yaml: dhcp4-overrides: /etc/netplan/99_config.yaml: use-mtu: false /etc/netplan/99_config.yaml: use-routes: false # grep . /etc/networkd-dispatcher/*/* /etc/networkd-dispatcher/configured.d/nat:#!/bin/bash /etc/networkd-dispatcher/configured.d/nat:# Do additional configuration for the inside and outside interfaces /etc/networkd-dispatcher/configured.d/nat:# route table used for forwarded/routed/natted traffic /etc/networkd-dispatcher/configured.d/nat:FWD_TABLE=99 /etc/networkd-dispatcher/configured.d/nat:if [ "${IFACE}" = "ens6" ]; then /etc/networkd-dispatcher/configured.d/nat: # delete link-local route for inside in default table /etc/networkd-dispatcher/configured.d/nat: /sbin/ip route delete 10.0.3.0/24 2>/dev/null || true /etc/networkd-dispatcher/configured.d/nat: # add link-local route for inside in table 99 /etc/networkd-dispatcher/configured.d/nat: /sbin/ip route replace 10.0.3.0/24 dev ens6 scope link src 10.0.3.171 table ${FWD_TABLE} /etc/networkd-dispatcher/configured.d/nat: # add routes to VPC cidrs via inside gateway in table 99 /etc/networkd-dispatcher/configured.d/nat: /sbin/ip route replace 10.0.0.0/16 via 10.0.3.1 table ${FWD_TABLE} /etc/networkd-dispatcher/configured.d/nat: # add rules to use table 99 /etc/networkd-dispatcher/configured.d/nat: /sbin/ip rule add iif ens6 lookup ${FWD_TABLE} /etc/networkd-dispatcher/configured.d/nat: /sbin/ip rule add oif ens6 lookup ${FWD_TABLE} /etc/networkd-dispatcher/configured.d/nat: /sbin/ip rule add from 10.0.3.171/32 lookup ${FWD_TABLE} /etc/networkd-dispatcher/configured.d/nat:elif [ "${IFACE}" = "ens7" ]; then /etc/networkd-dispatcher/configured.d/nat: # delete link-local route for outside in default table /etc/networkd-dispatcher/configured.d/nat: /sbin/ip route delete 10.0.2.0/24 2>/dev/null || true /etc/networkd-dispatcher/configured.d/nat: # add link-local route for outside in table 99 /etc/networkd-dispatcher/configured.d/nat: /sbin/ip route replace 10.0.2.0/24 dev ens7 scope link src 10.0.2.245 table ${FWD_TABLE} /etc/networkd-dispatcher/configured.d/nat: # add default route via outside gateway in table 99 /etc/networkd-dispatcher/configured.d/nat: /sbin/ip route replace default via 10.0.2.1 table ${FWD_TABLE} /etc/networkd-dispatcher/configured.d/nat: # add rules to use table 99 /etc/networkd-dispatcher/configured.d/nat: /sbin/ip rule add iif ens7 lookup ${FWD_TABLE} /etc/networkd-dispatcher/configured.d/nat: /sbin/ip rule add oif ens7 lookup ${FWD_TABLE} /etc/networkd-dispatcher/configured.d/nat: /sbin/ip rule add from 10.0.2.245/32 lookup ${FWD_TABLE} /etc/networkd-dispatcher/configured.d/nat: # add rules to use the inet route for local traffic but only if it's not destined for an RFC1918 private range /etc/networkd-dispatcher/configured.d/nat: # IMPORTANT: order matters; the priority of rules is reverse of the order in which they are added. /etc/networkd-dispatcher/configured.d/nat: # so the default/fallback is added first and then the local overrides. /etc/networkd-dispatcher/configured.d/nat: #/sbin/ip rule add iif lo lookup ${FWD_TABLE} /etc/networkd-dispatcher/configured.d/nat: #ip rule add to 10.0.0.0/8 iif lo lookup main /etc/networkd-dispatcher/configured.d/nat: #ip rule add to 172.16.0.0/12 iif lo lookup main /etc/networkd-dispatcher/configured.d/nat: #ip rule add to 192.168.0.0/16 iif lo lookup main /etc/networkd-dispatcher/configured.d/nat: # ensure the forward policy is accept /etc/networkd-dispatcher/configured.d/nat: iptables -P FORWARD ACCEPT /etc/networkd-dispatcher/configured.d/nat: # configure iptables to do NAT /etc/networkd-dispatcher/configured.d/nat: /sbin/iptables -t nat -I POSTROUTING 1 -o ens7 -j SNAT --to-source 10.0.2.245 /etc/networkd-dispatcher/configured.d/nat: # clean up any other rules /etc/networkd-dispatcher/configured.d/nat: while /sbin/iptables -t nat -D POSTROUTING 2 2>/dev/null; do :; done /etc/networkd-dispatcher/configured.d/nat:fi /etc/networkd-dispatcher/configuring.d/nat:#!/bin/bash /etc/networkd-dispatcher/configuring.d/nat:# Tear down existing ip rules so they aren't duplicated /etc/networkd-dispatcher/configuring.d/nat:if [ "${IFACE}" = "ens6" ]; then /etc/networkd-dispatcher/configuring.d/nat: # flush any existing rules referenceing this interface /etc/networkd-dispatcher/configuring.d/nat: OLDIFS="${IFS}" /etc/networkd-dispatcher/configuring.d/nat: IFS=" /etc/networkd-dispatcher/configuring.d/nat:" /etc/networkd-dispatcher/configuring.d/nat: for rule in `ip rule show|egrep "ens6|10.0.3.171" | cut -d: -f2-`; do /etc/networkd-dispatcher/configuring.d/nat: IFS="${OLDIFS}" /etc/networkd-dispatcher/configuring.d/nat: ip rule delete ${rule} /etc/networkd-dispatcher/configuring.d/nat: done /etc/networkd-dispatcher/configuring.d/nat: IFS="${OLDIFS}" /etc/networkd-dispatcher/configuring.d/nat:elif [ "${IFACE}" = "ens7" ]; then /etc/networkd-dispatcher/configuring.d/nat: # flush any existing rules referencing this interface /etc/networkd-dispatcher/configuring.d/nat: OLDIFS="${IFS}" /etc/networkd-dispatcher/configuring.d/nat: IFS=" /etc/networkd-dispatcher/configuring.d/nat:" /etc/networkd-dispatcher/configuring.d/nat: for rule in `ip rule show|egrep "ens7|10.0.2.245|iif lo" | cut -d: -f2-`; do /etc/networkd-dispatcher/configuring.d/nat: IFS="${OLDIFS}" /etc/networkd-dispatcher/configuring.d/nat: ip rule delete ${rule} /etc/networkd-dispatcher/configuring.d/nat: done /etc/networkd-dispatcher/configuring.d/nat: IFS="${OLDIFS}" /etc/networkd-dispatcher/configuring.d/nat:fi ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: systemd 237-3ubuntu10.39 ProcVersionSignature: Ubuntu 4.15.0-1060.62-aws 4.15.18 Uname: Linux 4.15.0-1060-aws x86_64 ApportVersion: 2.20.9-0ubuntu7.11 Architecture: amd64 Date: Wed Jun 3 21:24:28 2020 Ec2AMI: ami-0238c6e72a7e906fc Ec2AMIManifest: (unknown) Ec2AvailabilityZone: us-east-1b Ec2InstanceType: c5n.large Ec2Kernel: unavailable Ec2Ramdisk: unavailable Lsusb: Error: command ['lsusb'] failed with exit code 1: MachineType: Amazon EC2 c5n.large ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-1060-aws root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 SourcePackage: systemd UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 10/16/2017 dmi.bios.vendor: Amazon EC2 dmi.bios.version: 1.0 dmi.board.asset.tag: i-0c058310742990713 dmi.board.vendor: Amazon EC2 dmi.chassis.asset.tag: Amazon EC2 dmi.chassis.type: 1 dmi.chassis.vendor: Amazon EC2 dmi.modalias: dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:svnAmazonEC2:pnc5n.large:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr: dmi.product.name: c5n.large dmi.sys.vendor: Amazon EC2 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1881972/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp