You have been subscribed to a public bug:

BACKGROUND:

cloud-init-local.service runs before networking has started. On non-
Oracle platforms, before networking has come up, cloud-init will create
an ephemeral connection to the cloud's IMDS using DHCP to retrieve
instance metadata. On Oracle, this normally isn't necessary as we boot
with connectivity to the IMDS out of the box. This can be seen in the
following Jammy instance using an SR-IOV NIC:

2024-03-05 14:09:05,351 - url_helper.py[DEBUG]: [0/1] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 5.0, 'headers': {'User-Agent'
: 'Cloud-Init/23.3.3-0ubuntu0~22.04.1', 'Authorization': 'Bearer Oracle'}} 
configuration
2024-03-05 14:09:05,362 - url_helper.py[DEBUG]: Read from 
http://169.254.169.254/opc/v2/instance/ (200, 2663b) after 1 attempts
2024-03-05 14:09:05,362 - ephemeral.py[DEBUG]: Skip ephemeral DHCP setup, 
instance has connectivity to {'url': 'http://169.254.169.254/opc/v2/instance/', 
'headers': {'Authorization': 'Bearer Oracle'}, 'timeout': 5}
2024-03-05 14:09:05,362 - url_helper.py[DEBUG]: [0/3] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'headers': {'User-Agent': 'Cloud-Init/23
.3.3-0ubuntu0~22.04.1', 'Authorization': 'Bearer Oracle'}} configuration
2024-03-05 14:09:05,368 - url_helper.py[DEBUG]: Read from 
http://169.254.169.254/opc/v2/instance/ (200, 2663b) after 1 attempts

Notice the "Skip ephemeral DHCP setup, instance has connectivity". This
means that cloud-init has determined that it already has connectivity
and doesn't need to do any additional setup to retrieve data from the
IMDS.

We can also see the same behavior on a Noble paravirtualized instance:

2024-03-01 20:51:33,482 - url_helper.py[DEBUG]: [0/1] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 5.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1', 'Authorization': 'Bearer Oracle'}} 
configuration
2024-03-01 20:51:33,488 - url_helper.py[DEBUG]: Read from 
http://169.254.169.254/opc/v2/instance/ (200, 3067b) after 1 attempts
2024-03-01 20:51:33,488 - ephemeral.py[DEBUG]: Skip ephemeral DHCP setup, 
instance has connectivity to {'url': 'http://169.254.169.254/opc/v2/instance/', 
'headers': {'Authorization': 'Bearer Oracle'}, 'timeout': 5}
2024-03-01 20:51:33,489 - url_helper.py[DEBUG]: [0/3] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1', 'Authorization': 'Bearer Oracle'}} 
configuration
2024-03-01 20:51:33,500 - url_helper.py[DEBUG]: Read from 
http://169.254.169.254/opc/v2/instance/ (200, 3067b) after 1 attempts
2024-03-01 20:51:33,501 - util.py[DEBUG]: Writing to 
/run/cloud-init/cloud-id-oracle - wb: [644] 7 bytes

PROBLEM:

On a Noble instance using Hardware-assisted (SR-IOV) networking, this is
not working. cloud-init-local.service no longer has immediate
connectivity to the IMDS. Since it cannot connect, in then attempts to
create an ephemeral connection to the IMDS using DHCP. It is able to
obtain a DHCP lease, but then when it tries to connect to the IMDS, the
call just hangs. The call has no timeout, so this results in an instance
that cannot be logged into even via the serial console because cloud-
init is blocking the rest of boot. A simple cloud-init workaround is to
add something along the lines of `timeout=2` to
https://github.com/canonical/cloud-
init/blob/main/cloudinit/sources/DataSourceOracle.py#L349 . This allows
cloud-init to boot. Looking at the logs, we can see that cloud-init is
unable to connect to the IMDS:

2024-03-05 14:23:54,836 - ephemeral.py[DEBUG]: Received dhcp lease on ens3 for 
10.0.0.133/255.255.255.0
2024-03-05 14:23:54,837 - url_helper.py[DEBUG]: [0/3] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 2.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1', 'Authorization': 'Bearer Oracle'}} 
configuration
2024-03-05 14:23:56,841 - url_helper.py[DEBUG]: Please wait 1 seconds while we 
wait to try again
2024-03-05 14:23:57,842 - url_helper.py[DEBUG]: [1/3] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 2.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1', 'Authorization': 'Bearer Oracle'}} 
configuration
2024-03-05 14:23:59,847 - url_helper.py[DEBUG]: Please wait 1 seconds while we 
wait to try again
2024-03-05 14:24:00,847 - url_helper.py[DEBUG]: [2/3] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 2.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1', 'Authorization': 'Bearer Oracle'}} 
configuration
2024-03-05 14:24:02,852 - url_helper.py[DEBUG]: [0/3] open 
'http://169.254.169.254/opc/v1/instance/' with {'url': 
'http://169.254.169.254/opc/v1/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 2.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1'}} configuration
2024-03-05 14:24:04,855 - url_helper.py[DEBUG]: Please wait 1 seconds while we 
wait to try again
2024-03-05 14:24:05,855 - url_helper.py[DEBUG]: [1/3] open 
'http://169.254.169.254/opc/v1/instance/' with {'url': 
'http://169.254.169.254/opc/v1/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 2.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1'}} configuration
2024-03-05 14:24:07,859 - url_helper.py[DEBUG]: Please wait 1 seconds while we 
wait to try again
2024-03-05 14:24:08,859 - url_helper.py[DEBUG]: [2/3] open 
'http://169.254.169.254/opc/v1/instance/' with {'url': 
'http://169.254.169.254/opc/v1/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 2.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1'}} configuration
2024-03-05 14:24:10,863 - handlers.py[DEBUG]: finish: init-local/search-Oracle: 
FAIL: no local data found from DataSourceOracle
2024-03-05 14:24:10,863 - util.py[WARNING]: Getting data from <class 
'cloudinit.sources.DataSourceOracle.DataSourceOracle'> failed
2024-03-05 14:24:10,863 - util.py[DEBUG]: Getting data from <class 
'cloudinit.sources.DataSourceOracle.DataSourceOracle'> failed
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", 
line 370, in read_opc_metadata
    instance_data = _fetch(metadata_version, path="instance")
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", 
line 346, in _fetch
    return readurl(
           ^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/url_helper.py", line 370, in 
readurl
    raise excps[-1]
cloudinit.url_helper.UrlError: HTTPConnectionPool(host='169.254.169.254', 
port=80): Read timed out. (read timeout=2.0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 
1028, in find_source
    if s.update_metadata_if_supported(
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 
914, in update_metadata_if_supported
    result = self.get_data()
             ^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 
460, in get_data
    return_value = self._check_and_get_data()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 
392, in _check_and_get_data
    return self._get_data()
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", 
line 165, in _get_data
    fetched_metadata = read_opc_metadata(
                       ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", 
line 373, in read_opc_metadata
    instance_data = _fetch(metadata_version, path="instance")
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", 
line 346, in _fetch
    return readurl(
           ^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/url_helper.py", line 370, in 
readurl
    raise excps[-1]
cloudinit.url_helper.UrlError: HTTPConnectionPool(host='169.254.169.254', 
port=80): Read timed out. (read timeout=2.0)
2024-03-05 14:24:10,898 - main.py[DEBUG]: No local datasource found

Despite this, cloud-init is still able to read and render the networking
configuration sourced from initramfs:

2024-03-05 14:24:10,899 - util.py[DEBUG]: Read 272 bytes from /run/net-ens3.conf
...
2024-03-05 14:24:10,914 - stages.py[INFO]: Applying network configuration from 
initramfs bringup=False: {'config': [{'type': 'physical', 'name': 'ens3', 
'subnets': [{'type': 'dhcp', 'control': 'manual', 'netmask': '255.255.255.0', 
'broadcast': '10.0.0.255', 'gateway': '10.0.0.1', 'dns_nameservers': 
['169.254.169.254']}], 'mac_address': '02:00:17:0f:50:8d'}], 'version': 1}
2024-03-05 14:24:10,914 - util.py[DEBUG]: Writing to 
/run/cloud-init/sem/apply_network_config.once - wb: [644] 23 bytes
2024-03-05 14:24:10,915 - distros[DEBUG]: Selected renderer 'netplan' from 
priority list: ['netplan', 'eni', 'sysconfig']
2024-03-05 14:24:10,918 - subp.py[DEBUG]: Running command ['netplan', 'info'] 
with allowed return codes [0] (shell=False, capture=True)
2024-03-05 14:24:11,109 - subp.py[DEBUG]: command ['netplan', 'info'] took 0.1s 
to run
2024-03-05 14:24:11,109 - util.py[DEBUG]: Attempting to load yaml from string 
of length 332 with allowed root types (<class 'dict'>,)
2024-03-05 14:24:11,111 - util.py[DEBUG]: Writing to 
/etc/netplan/50-cloud-init.yaml - wb: [600] 481 bytes
2024-03-05 14:24:11,111 - subp.py[DEBUG]: Running command ['netplan', 
'generate'] with allowed return codes [0] (shell=False, capture=True)
2024-03-05 14:24:11,300 - subp.py[DEBUG]: command ['netplan', 'generate'] took 
0.1s to run

This allows networking to come up as expected on the primary interface,
but cloud-init has been unable to fetch userdata/metadata or retrieve
information about any secondary interfaces.

SUMMARY:

I see two separate issues here:

1. Cloud-init should be able to deal with the lack of network in early boot. 
This can be fixed on the cloud-init side.
2. Early boot network connectivity works across every other series and instance 
type except for Noble using Hardware-assisted (SR-IOV) networking.

I am unsure the cause of #2.

** Affects: cloud-images
     Importance: Undecided
         Status: New

** Affects: initramfs-tools (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: cloud-init (Ubuntu)
     Importance: Undecided
         Status: New

-- 
Networking broken in early boot on Oracle Native instances
https://bugs.launchpad.net/bugs/2056194
You received this bug notification because you are a member of Ubuntu Touch 
seeded packages, which is subscribed to initramfs-tools in Ubuntu.

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to