The SSH failure seems to be related to cloud-init not detecting the OVS bridge 
+ slaves correctly. Therefore, the cloud-init 'init' stage fails with an 
exception:
"RuntimeError: Not all expected physical devices present: {'52:54:00:d9:08:1c'}"

I'm working on a pull request here:
https://github.com/canonical/cloud-init/pull/608

In combination with the netplan PR, this should solve the issue
described here.

** Also affects: cloud-init
   Importance: Undecided
       Status: New

** Changed in: cloud-init
       Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1898997

Title:
  MAAS cannot deploy/boot if OVS bridge is configured on a single PXE
  NIC

Status in cloud-init:
  In Progress
Status in netplan:
  In Progress

Bug description:
  Problem description:
  If we try to deploy a single-NIC machine via MAAS, configuring an Open 
vSwitch bridge as the primary/PXE interface, the machine will install and boot 
Ubuntu 20.04 but it cannot finish the whole configuration (e.g. copying of SSH 
keys) and cannot be accessed/controlled via MAAS. It ends up in a "Failed" 
state.

  This is because systemd-network-wait-online.service fails (for some
  reason), before netplan can fully setup and configure the OVS bridge.
  Because of broken networking cloud-init cannot complete its final
  stages, like setup of SSH keys or signaling its state back to MAAS. If
  we wait a little longer the OVS bridge will actually come online and
  networking is working – SSH not being setup and MAAS state still
  "Failed", though.

  Steps to reproduce:
  * Setup a (virtual) MAAS system, e.g. inside a LXD container using a KVM 
host, as described here:
  
https://discourse.maas.io/t/setting-up-a-flexible-virtual-maas-test-environment/142
  * Install & setup maas[-cli] snap from 2.9/beta channel (instead of the 
deb/PPA from the discourse post)
  * Configure netplan PPA+key for testing via "Settings" -> "Package repos":
  https://launchpad.net/~slyon/+archive/ubuntu/ovs
  * Prepare curtin preseed in /var/snap/maas/current/preseeds/curtin_userdata, 
inside the LXD container (so you can access the broken machine afterwards):
  ======================
  #cloud-config
  debconf_selections:
   maas: |
    {{for line in str(curtin_preseed).splitlines()}}
    {{line}}
    {{endfor}}
  late_commands:
    maas: [wget, '--no-proxy', '{{node_disable_pxe_url}}', '--post-data', 
'{{node_disable_pxe_data}}', '-O', '/dev/null']
    90_create_user: ["curtin", "in-target", "--", "sh", "-c", "sudo useradd 
test -g 0 -G sudo"]
    92_set_user_password: ["curtin", "in-target", "--", "sh", "-c", "echo 
'test:test' | sudo chpasswd"]
    94_cat: ["curtin", "in-target", "--", "sh", "-c", "cat /etc/passwd"]
  ======================
  * Compose a new virtual machine via MAAS' "KVM" menu, named e.g. "test1"
  * Watch it being commissioned via MAAS' "Machines" menu
  * Once it's ready select your machine (e.g. "test1.maas") -> Network
  * Select the single network interface (e.g. "ens4") -> Create bridge
  * Choose "Bridge type: Open vSwitch (ovs)", Select "Subnet" and "IP mode", 
save.
  * Deploy machine to Ubuntu 20.04 via "Take action" button

  The machine will install the OS and boot, but will end up in a
  "Failed" state inside MAAS due to network/OVS not being setup
  correctly. MAAS/SSH has no control over it. You can access the
  (broken) machine via serial console from the KVM-host (i.e. LXD
  container) via "virsh console test1" using the "test:test"
  credentials.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1898997/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to