** Merge proposal unlinked:
   
https://code.launchpad.net/~vanvugt/ubuntu/+source/initramfs-tools/+git/initramfs-tools/+merge/462481

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to initramfs-tools in Ubuntu.
https://bugs.launchpad.net/bugs/2056194

Title:
  Networking broken in early boot on Oracle Native instances due to MTU
  settings

Status in cloud-images:
  New
Status in cloud-init package in Ubuntu:
  Fix Released
Status in initramfs-tools package in Ubuntu:
  Fix Committed

Bug description:
  BACKGROUND:

  cloud-init-local.service runs before networking has started. On non-
  Oracle platforms, before networking has come up, cloud-init will
  create an ephemeral connection to the cloud's IMDS using DHCP to
  retrieve instance metadata. On Oracle, this normally isn't necessary
  as we boot with connectivity to the IMDS out of the box. This can be
  seen in the following Jammy instance using an SR-IOV NIC:

  2024-03-05 14:09:05,351 - url_helper.py[DEBUG]: [0/1] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 5.0, 'headers': {'User-Agent'
  : 'Cloud-Init/23.3.3-0ubuntu0~22.04.1', 'Authorization': 'Bearer Oracle'}} 
configuration
  2024-03-05 14:09:05,362 - url_helper.py[DEBUG]: Read from 
http://169.254.169.254/opc/v2/instance/ (200, 2663b) after 1 attempts
  2024-03-05 14:09:05,362 - ephemeral.py[DEBUG]: Skip ephemeral DHCP setup, 
instance has connectivity to {'url': 'http://169.254.169.254/opc/v2/instance/', 
'headers': {'Authorization': 'Bearer Oracle'}, 'timeout': 5}
  2024-03-05 14:09:05,362 - url_helper.py[DEBUG]: [0/3] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'headers': {'User-Agent': 'Cloud-Init/23
  .3.3-0ubuntu0~22.04.1', 'Authorization': 'Bearer Oracle'}} configuration
  2024-03-05 14:09:05,368 - url_helper.py[DEBUG]: Read from 
http://169.254.169.254/opc/v2/instance/ (200, 2663b) after 1 attempts

  Notice the "Skip ephemeral DHCP setup, instance has connectivity".
  This means that cloud-init has determined that it already has
  connectivity and doesn't need to do any additional setup to retrieve
  data from the IMDS.

  We can also see the same behavior on a Noble paravirtualized instance:

  2024-03-01 20:51:33,482 - url_helper.py[DEBUG]: [0/1] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 5.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1', 'Authorization': 'Bearer Oracle'}} 
configuration
  2024-03-01 20:51:33,488 - url_helper.py[DEBUG]: Read from 
http://169.254.169.254/opc/v2/instance/ (200, 3067b) after 1 attempts
  2024-03-01 20:51:33,488 - ephemeral.py[DEBUG]: Skip ephemeral DHCP setup, 
instance has connectivity to {'url': 'http://169.254.169.254/opc/v2/instance/', 
'headers': {'Authorization': 'Bearer Oracle'}, 'timeout': 5}
  2024-03-01 20:51:33,489 - url_helper.py[DEBUG]: [0/3] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1', 'Authorization': 'Bearer Oracle'}} 
configuration
  2024-03-01 20:51:33,500 - url_helper.py[DEBUG]: Read from 
http://169.254.169.254/opc/v2/instance/ (200, 3067b) after 1 attempts
  2024-03-01 20:51:33,501 - util.py[DEBUG]: Writing to 
/run/cloud-init/cloud-id-oracle - wb: [644] 7 bytes

  PROBLEM:

  On a Noble instance using Hardware-assisted (SR-IOV) networking, this
  is not working. cloud-init-local.service no longer has immediate
  connectivity to the IMDS. Since it cannot connect, in then attempts to
  create an ephemeral connection to the IMDS using DHCP. It is able to
  obtain a DHCP lease, but then when it tries to connect to the IMDS,
  the call just hangs. The call has no timeout, so this results in an
  instance that cannot be logged into even via the serial console
  because cloud-init is blocking the rest of boot. A simple cloud-init
  workaround is to add something along the lines of `timeout=2` to
  https://github.com/canonical/cloud-
  init/blob/main/cloudinit/sources/DataSourceOracle.py#L349 . This
  allows cloud-init to boot. Looking at the logs, we can see that cloud-
  init is unable to connect to the IMDS:

  2024-03-05 14:23:54,836 - ephemeral.py[DEBUG]: Received dhcp lease on ens3 
for 10.0.0.133/255.255.255.0
  2024-03-05 14:23:54,837 - url_helper.py[DEBUG]: [0/3] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 2.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1', 'Authorization': 'Bearer Oracle'}} 
configuration
  2024-03-05 14:23:56,841 - url_helper.py[DEBUG]: Please wait 1 seconds while 
we wait to try again
  2024-03-05 14:23:57,842 - url_helper.py[DEBUG]: [1/3] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 2.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1', 'Authorization': 'Bearer Oracle'}} 
configuration
  2024-03-05 14:23:59,847 - url_helper.py[DEBUG]: Please wait 1 seconds while 
we wait to try again
  2024-03-05 14:24:00,847 - url_helper.py[DEBUG]: [2/3] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 2.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1', 'Authorization': 'Bearer Oracle'}} 
configuration
  2024-03-05 14:24:02,852 - url_helper.py[DEBUG]: [0/3] open 
'http://169.254.169.254/opc/v1/instance/' with {'url': 
'http://169.254.169.254/opc/v1/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 2.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1'}} configuration
  2024-03-05 14:24:04,855 - url_helper.py[DEBUG]: Please wait 1 seconds while 
we wait to try again
  2024-03-05 14:24:05,855 - url_helper.py[DEBUG]: [1/3] open 
'http://169.254.169.254/opc/v1/instance/' with {'url': 
'http://169.254.169.254/opc/v1/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 2.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1'}} configuration
  2024-03-05 14:24:07,859 - url_helper.py[DEBUG]: Please wait 1 seconds while 
we wait to try again
  2024-03-05 14:24:08,859 - url_helper.py[DEBUG]: [2/3] open 
'http://169.254.169.254/opc/v1/instance/' with {'url': 
'http://169.254.169.254/opc/v1/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 2.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1'}} configuration
  2024-03-05 14:24:10,863 - handlers.py[DEBUG]: finish: 
init-local/search-Oracle: FAIL: no local data found from DataSourceOracle
  2024-03-05 14:24:10,863 - util.py[WARNING]: Getting data from <class 
'cloudinit.sources.DataSourceOracle.DataSourceOracle'> failed
  2024-03-05 14:24:10,863 - util.py[DEBUG]: Getting data from <class 
'cloudinit.sources.DataSourceOracle.DataSourceOracle'> failed
  Traceback (most recent call last):
    File 
"/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", line 
370, in read_opc_metadata
      instance_data = _fetch(metadata_version, path="instance")
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File 
"/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", line 
346, in _fetch
      return readurl(
             ^^^^^^^^
    File "/usr/lib/python3/dist-packages/cloudinit/url_helper.py", line 370, in 
readurl
      raise excps[-1]
  cloudinit.url_helper.UrlError: HTTPConnectionPool(host='169.254.169.254', 
port=80): Read timed out. (read timeout=2.0)

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 
1028, in find_source
      if s.update_metadata_if_supported(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 
914, in update_metadata_if_supported
      result = self.get_data()
               ^^^^^^^^^^^^^^^
    File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 
460, in get_data
      return_value = self._check_and_get_data()
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 
392, in _check_and_get_data
      return self._get_data()
             ^^^^^^^^^^^^^^^^
    File 
"/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", line 
165, in _get_data
      fetched_metadata = read_opc_metadata(
                         ^^^^^^^^^^^^^^^^^^
    File 
"/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", line 
373, in read_opc_metadata
      instance_data = _fetch(metadata_version, path="instance")
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File 
"/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", line 
346, in _fetch
      return readurl(
             ^^^^^^^^
    File "/usr/lib/python3/dist-packages/cloudinit/url_helper.py", line 370, in 
readurl
      raise excps[-1]
  cloudinit.url_helper.UrlError: HTTPConnectionPool(host='169.254.169.254', 
port=80): Read timed out. (read timeout=2.0)
  2024-03-05 14:24:10,898 - main.py[DEBUG]: No local datasource found

  Despite this, cloud-init is still able to read and render the
  networking configuration sourced from initramfs:

  2024-03-05 14:24:10,899 - util.py[DEBUG]: Read 272 bytes from 
/run/net-ens3.conf
  ...
  2024-03-05 14:24:10,914 - stages.py[INFO]: Applying network configuration 
from initramfs bringup=False: {'config': [{'type': 'physical', 'name': 'ens3', 
'subnets': [{'type': 'dhcp', 'control': 'manual', 'netmask': '255.255.255.0', 
'broadcast': '10.0.0.255', 'gateway': '10.0.0.1', 'dns_nameservers': 
['169.254.169.254']}], 'mac_address': '02:00:17:0f:50:8d'}], 'version': 1}
  2024-03-05 14:24:10,914 - util.py[DEBUG]: Writing to 
/run/cloud-init/sem/apply_network_config.once - wb: [644] 23 bytes
  2024-03-05 14:24:10,915 - distros[DEBUG]: Selected renderer 'netplan' from 
priority list: ['netplan', 'eni', 'sysconfig']
  2024-03-05 14:24:10,918 - subp.py[DEBUG]: Running command ['netplan', 'info'] 
with allowed return codes [0] (shell=False, capture=True)
  2024-03-05 14:24:11,109 - subp.py[DEBUG]: command ['netplan', 'info'] took 
0.1s to run
  2024-03-05 14:24:11,109 - util.py[DEBUG]: Attempting to load yaml from string 
of length 332 with allowed root types (<class 'dict'>,)
  2024-03-05 14:24:11,111 - util.py[DEBUG]: Writing to 
/etc/netplan/50-cloud-init.yaml - wb: [600] 481 bytes
  2024-03-05 14:24:11,111 - subp.py[DEBUG]: Running command ['netplan', 
'generate'] with allowed return codes [0] (shell=False, capture=True)
  2024-03-05 14:24:11,300 - subp.py[DEBUG]: command ['netplan', 'generate'] 
took 0.1s to run

  This allows networking to come up as expected on the primary
  interface, but cloud-init has been unable to fetch userdata/metadata
  or retrieve information about any secondary interfaces.

  SUMMARY:

  I see two separate issues here:

  1. Cloud-init should be able to deal with the lack of network in early boot. 
This can be fixed on the cloud-init side.
  2. Early boot network connectivity works across every other series and 
instance type except for Noble using Hardware-assisted (SR-IOV) networking.

  I am unsure the cause of #2.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2056194/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to