** Description changed: [ Impact ] * netplan-sriov-apply.service can sometimes fail to configure sriov interfaces. * Issue happens when netplan is performing per interface configuration and udev rules are modifying PF interface names. If that happens netplan will fail to get some PF related data as expected /sys/class/net/<ifname>/ directory will no longer exist. * Depending on the timing between netplan-sriov-apply.service and udev rules execution, one or more PF interfaces might be unconfigured. * This issue might a be root cause for following netplan bugs: - https://bugs.launchpad.net/netplan/+bug/1988018 - https://bugs.launchpad.net/netplan/+bug/2020409 * A proposed solution is to make sure that udev rules are triggered and finished before netplan-sriov-apply.service starts executing. * Issue was most likely introduced by https://bugs.launchpad.net/netplan/+bug/1988018 - this change introduced netplan-sriov-apply.service - jammy 0.107.1-3ubuntu0.22.04.2 is still in -proposed - noble/questing/resolute released it as part of v1.0 * Issue is reproduced when user specifies set-name config value with a name different than what systemd networkd generated - During the boot process, interface will first be renamed to ethX, then networkd will apply its PCI address based naming, and only then udev will process rules created by using set-name config value. - If set-name is not used or name specified in set-name is the same as the one networkd generated, issue will not reproduce. [ Test Plan ] * Create a netplan config which modifies interface name and sets sriov config, for instance: 50-if.yaml: network: ethernets: ens1f0: match: macaddress: b8:3f:d2:09:38:94 mtu: 1500 optional: true set-name: ens1f0 ens1f1: match: macaddress: b8:3f:d2:09:38:94 mtu: 1500 optional: true set-name: ens1f1 99-sriov.yaml: network: version: 2 ethernets: ens1f0: virtual-function-count: 32 embedded-switch-mode: switchdev delay-virtual-functions-rebind: true ethernets: ens1f1: virtual-function-count: 32 embedded-switch-mode: switchdev delay-virtual-functions-rebind: true NOTE: name generated for these interfaces by networkd are ens1f0np0 and ens1f1np1 * Reboot the host with above config * After reboot verify if sriov configuration was properly applied on the interface. Expected result: Config was properly applied by netplan-sriov-apply.service Actual results: Feb 02 12:15:49 doopliss netplan[1163]: ERROR:root:could not determine vendor and device ID of ens1f1np1: [Errno 2] No such file or directory: '/sys/class/net/ens1f1np1/device/vendor' Feb 02 12:15:49 doopliss systemd[1]: netplan-sriov-apply.service: Main process exited, code=exited, status=1/FAILURE Feb 02 12:15:49 doopliss systemd[1]: netplan-sriov-apply.service: Failed with result 'exit-code'. In this example, netplan-sriov-apply.service started around Feb 02 12:15:27, it properly configured first interface using old name ens1f0np0. Then second interface ens1f1np1 was renamed: Feb 02 12:15:37 doopliss kernel: mlx5_core 0000:4b:00.1 ens1f1: renamed from ens1f1np1 Netplan using name ens1f1np1 failed to get /sys/class/net/ens1f1np1/device/vendor, as new proper path should be /sys/class/net/ens1f1/device/vendor This is just an example, when interface name changes when netplan-sriov.apply.service is running, netplan can fail in different parts of the code which can result in similar Error log: "[Errno 2] No such file or directory" such as mentioned in LP1988018: Apr 16 15:44:44 romano netplan[1171]: failed parsing sriov_totalvfs for ens7f1np1: [Errno 2] No such file or directory: '/sys/class/net/ens7f1np1/device/sriov_totalvfs' [ Where problems could occur ] * Proposed change is making sure that udev rules are triggered and done before netplan-sriov-apply.service starts. Inspecting current `netplan apply` logic shows that this is already performed in the code for `netplan apply` command but is missing from `netplan apply --sriov-only` which is called by netplan-sriov-apply.service. * If there are any other processes which are modifying interface names, issue can still be reproduced. * With new change following commands will be executed: - udevadm control --reload - udevadm trigger --action=add --subsystem-match=net - udevadm settle If any of the commands hangs, service might not start properly and leave interfaces unconfigured. [ Other Info ] * Issue can be quite reliable reproduced on jammy-proposed * I was not able to reproduce issue on Noble, when applying the same configuration. Once netplan-sriov-apply.service starts interfaces are already set to proper name. This might points to differences in systemd. This also doesn't mean that issue can't be reproduced. Service requires already set interface names and current settings does not guarantee that. * Fix was verified on PS6 environment which reported issues in LP2020409 + + * Upstream PR: https://github.com/canonical/netplan/pull/569
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2139598 Title: Netplan can crash when applying sriov config To manage notifications about this bug go to: https://bugs.launchpad.net/netplan/+bug/2139598/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
