** Description changed:

  BACKGROUND:
  
  cloud-init-local.service runs before networking has started. On non-
  Oracle platforms, before networking has come up, cloud-init will create
  an ephemeral connection to the cloud's IMDS using DHCP to retrieve
  instance metadata. On Oracle, this normally isn't necessary as we boot
  with connectivity to the IMDS out of the box. This can be seen in the
  following Jammy instance using an SR-IOV NIC:
  
  2024-03-05 14:09:05,351 - url_helper.py[DEBUG]: [0/1] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 5.0, 'headers': {'User-Agent'
  : 'Cloud-Init/23.3.3-0ubuntu0~22.04.1', 'Authorization': 'Bearer Oracle'}} 
configuration
  2024-03-05 14:09:05,362 - url_helper.py[DEBUG]: Read from 
http://169.254.169.254/opc/v2/instance/ (200, 2663b) after 1 attempts
  2024-03-05 14:09:05,362 - ephemeral.py[DEBUG]: Skip ephemeral DHCP setup, 
instance has connectivity to {'url': 'http://169.254.169.254/opc/v2/instance/', 
'headers': {'Authorization': 'Bearer Oracle'}, 'timeout': 5}
  2024-03-05 14:09:05,362 - url_helper.py[DEBUG]: [0/3] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'headers': {'User-Agent': 'Cloud-Init/23
  .3.3-0ubuntu0~22.04.1', 'Authorization': 'Bearer Oracle'}} configuration
  2024-03-05 14:09:05,368 - url_helper.py[DEBUG]: Read from 
http://169.254.169.254/opc/v2/instance/ (200, 2663b) after 1 attempts
  
  Notice the "Skip ephemeral DHCP setup, instance has connectivity". This
  means that cloud-init has determined that it already has connectivity
  and doesn't need to do any additional setup to retrieve data from the
  IMDS.
  
  We can also see the same behavior on a Noble paravirtualized instance:
  
  2024-03-01 20:51:33,482 - url_helper.py[DEBUG]: [0/1] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 5.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1', 'Authorization': 'Bearer Oracle'}} 
configuration
  2024-03-01 20:51:33,488 - url_helper.py[DEBUG]: Read from 
http://169.254.169.254/opc/v2/instance/ (200, 3067b) after 1 attempts
  2024-03-01 20:51:33,488 - ephemeral.py[DEBUG]: Skip ephemeral DHCP setup, 
instance has connectivity to {'url': 'http://169.254.169.254/opc/v2/instance/', 
'headers': {'Authorization': 'Bearer Oracle'}, 'timeout': 5}
  2024-03-01 20:51:33,489 - url_helper.py[DEBUG]: [0/3] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1', 'Authorization': 'Bearer Oracle'}} 
configuration
  2024-03-01 20:51:33,500 - url_helper.py[DEBUG]: Read from 
http://169.254.169.254/opc/v2/instance/ (200, 3067b) after 1 attempts
  2024-03-01 20:51:33,501 - util.py[DEBUG]: Writing to 
/run/cloud-init/cloud-id-oracle - wb: [644] 7 bytes
  
  PROBLEM:
  
  On a Noble instance using Hardware-assisted (SR-IOV) networking, this is
  not working. cloud-init-local.service no longer has immediate
  connectivity to the IMDS. Since it cannot connect, in then attempts to
  create an ephemeral connection to the IMDS using DHCP. It is able to
  obtain a DHCP lease, but then when it tries to connect to the IMDS, the
  call just hangs. The call has no timeout, so this results in an instance
  that cannot be logged into even via the serial console because cloud-
  init is blocking the rest of boot. A simple cloud-init workaround is to
  add something along the lines of `timeout=2` to
  https://github.com/canonical/cloud-
  init/blob/main/cloudinit/sources/DataSourceOracle.py#L349 . This allows
  cloud-init to boot. Looking at the logs, we can see that cloud-init is
  unable to connect to the IMDS:
  
  2024-03-05 14:23:54,836 - ephemeral.py[DEBUG]: Received dhcp lease on ens3 
for 10.0.0.133/255.255.255.0
  2024-03-05 14:23:54,837 - url_helper.py[DEBUG]: [0/3] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 2.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1', 'Authorization': 'Bearer Oracle'}} 
configuration
  2024-03-05 14:23:56,841 - url_helper.py[DEBUG]: Please wait 1 seconds while 
we wait to try again
  2024-03-05 14:23:57,842 - url_helper.py[DEBUG]: [1/3] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 2.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1', 'Authorization': 'Bearer Oracle'}} 
configuration
  2024-03-05 14:23:59,847 - url_helper.py[DEBUG]: Please wait 1 seconds while 
we wait to try again
  2024-03-05 14:24:00,847 - url_helper.py[DEBUG]: [2/3] open 
'http://169.254.169.254/opc/v2/instance/' with {'url': 
'http://169.254.169.254/opc/v2/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 2.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1', 'Authorization': 'Bearer Oracle'}} 
configuration
  2024-03-05 14:24:02,852 - url_helper.py[DEBUG]: [0/3] open 
'http://169.254.169.254/opc/v1/instance/' with {'url': 
'http://169.254.169.254/opc/v1/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 2.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1'}} configuration
  2024-03-05 14:24:04,855 - url_helper.py[DEBUG]: Please wait 1 seconds while 
we wait to try again
  2024-03-05 14:24:05,855 - url_helper.py[DEBUG]: [1/3] open 
'http://169.254.169.254/opc/v1/instance/' with {'url': 
'http://169.254.169.254/opc/v1/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 2.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1'}} configuration
  2024-03-05 14:24:07,859 - url_helper.py[DEBUG]: Please wait 1 seconds while 
we wait to try again
  2024-03-05 14:24:08,859 - url_helper.py[DEBUG]: [2/3] open 
'http://169.254.169.254/opc/v1/instance/' with {'url': 
'http://169.254.169.254/opc/v1/instance/', 'stream': False, 'allow_redirects': 
True, 'method': 'GET', 'timeout': 2.0, 'headers': {'User-Agent': 
'Cloud-Init/24.1~7g54599148-0ubuntu1'}} configuration
  2024-03-05 14:24:10,863 - handlers.py[DEBUG]: finish: 
init-local/search-Oracle: FAIL: no local data found from DataSourceOracle
  2024-03-05 14:24:10,863 - util.py[WARNING]: Getting data from <class 
'cloudinit.sources.DataSourceOracle.DataSourceOracle'> failed
  2024-03-05 14:24:10,863 - util.py[DEBUG]: Getting data from <class 
'cloudinit.sources.DataSourceOracle.DataSourceOracle'> failed
  Traceback (most recent call last):
-   File 
"/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", line 
370, in read_opc_metadata
-     instance_data = _fetch(metadata_version, path="instance")
-                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-   File 
"/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", line 
346, in _fetch
-     return readurl(
-            ^^^^^^^^
-   File "/usr/lib/python3/dist-packages/cloudinit/url_helper.py", line 370, in 
readurl
-     raise excps[-1]
+   File 
"/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", line 
370, in read_opc_metadata
+     instance_data = _fetch(metadata_version, path="instance")
+                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+   File 
"/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", line 
346, in _fetch
+     return readurl(
+            ^^^^^^^^
+   File "/usr/lib/python3/dist-packages/cloudinit/url_helper.py", line 370, in 
readurl
+     raise excps[-1]
  cloudinit.url_helper.UrlError: HTTPConnectionPool(host='169.254.169.254', 
port=80): Read timed out. (read timeout=2.0)
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
-   File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 
1028, in find_source
-     if s.update_metadata_if_supported(
-        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-   File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 
914, in update_metadata_if_supported
-     result = self.get_data()
-              ^^^^^^^^^^^^^^^
-   File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 
460, in get_data
-     return_value = self._check_and_get_data()
-                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
-   File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 
392, in _check_and_get_data
-     return self._get_data()
-            ^^^^^^^^^^^^^^^^
-   File 
"/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", line 
165, in _get_data
-     fetched_metadata = read_opc_metadata(
-                        ^^^^^^^^^^^^^^^^^^
-   File 
"/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", line 
373, in read_opc_metadata
-     instance_data = _fetch(metadata_version, path="instance")
-                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-   File 
"/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", line 
346, in _fetch
-     return readurl(
-            ^^^^^^^^
-   File "/usr/lib/python3/dist-packages/cloudinit/url_helper.py", line 370, in 
readurl
-     raise excps[-1]
+   File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 
1028, in find_source
+     if s.update_metadata_if_supported(
+        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+   File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 
914, in update_metadata_if_supported
+     result = self.get_data()
+              ^^^^^^^^^^^^^^^
+   File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 
460, in get_data
+     return_value = self._check_and_get_data()
+                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
+   File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 
392, in _check_and_get_data
+     return self._get_data()
+            ^^^^^^^^^^^^^^^^
+   File 
"/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", line 
165, in _get_data
+     fetched_metadata = read_opc_metadata(
+                        ^^^^^^^^^^^^^^^^^^
+   File 
"/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", line 
373, in read_opc_metadata
+     instance_data = _fetch(metadata_version, path="instance")
+                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+   File 
"/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOracle.py", line 
346, in _fetch
+     return readurl(
+            ^^^^^^^^
+   File "/usr/lib/python3/dist-packages/cloudinit/url_helper.py", line 370, in 
readurl
+     raise excps[-1]
  cloudinit.url_helper.UrlError: HTTPConnectionPool(host='169.254.169.254', 
port=80): Read timed out. (read timeout=2.0)
  2024-03-05 14:24:10,898 - main.py[DEBUG]: No local datasource found
  
  Despite this, cloud-init is still able to read and render the networking
  configuration sourced from initramfs:
  
  2024-03-05 14:24:10,899 - util.py[DEBUG]: Read 272 bytes from 
/run/net-ens3.conf
  ...
  2024-03-05 14:24:10,914 - stages.py[INFO]: Applying network configuration 
from initramfs bringup=False: {'config': [{'type': 'physical', 'name': 'ens3', 
'subnets': [{'type': 'dhcp', 'control': 'manual', 'netmask': '255.255.255.0', 
'broadcast': '10.0.0.255', 'gateway': '10.0.0.1', 'dns_nameservers': 
['169.254.169.254']}], 'mac_address': '02:00:17:0f:50:8d'}], 'version': 1}
  2024-03-05 14:24:10,914 - util.py[DEBUG]: Writing to 
/run/cloud-init/sem/apply_network_config.once - wb: [644] 23 bytes
  2024-03-05 14:24:10,915 - distros[DEBUG]: Selected renderer 'netplan' from 
priority list: ['netplan', 'eni', 'sysconfig']
  2024-03-05 14:24:10,918 - subp.py[DEBUG]: Running command ['netplan', 'info'] 
with allowed return codes [0] (shell=False, capture=True)
  2024-03-05 14:24:11,109 - subp.py[DEBUG]: command ['netplan', 'info'] took 
0.1s to run
  2024-03-05 14:24:11,109 - util.py[DEBUG]: Attempting to load yaml from string 
of length 332 with allowed root types (<class 'dict'>,)
  2024-03-05 14:24:11,111 - util.py[DEBUG]: Writing to 
/etc/netplan/50-cloud-init.yaml - wb: [600] 481 bytes
  2024-03-05 14:24:11,111 - subp.py[DEBUG]: Running command ['netplan', 
'generate'] with allowed return codes [0] (shell=False, capture=True)
  2024-03-05 14:24:11,300 - subp.py[DEBUG]: command ['netplan', 'generate'] 
took 0.1s to run
  
  This allows networking to come up as expected on the primary interface,
  but cloud-init has been unable to fetch userdata/metadata or retrieve
  information about any secondary interfaces.
  
  SUMMARY:
  
- I see two separate issues here.
+ I see two separate issues here:
+ 
  1. Cloud-init should be able to deal with the lack of network in early boot. 
This can be fixed on the cloud-init side.
- 2. Early boot network connectivity works across every other series and 
instance type except for Noble using Hardware-assisted (SR-IOV) networking
+ 2. Early boot network connectivity works across every other series and 
instance type except for Noble using Hardware-assisted (SR-IOV) networking.
  
  I am unsure the cause of #2.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2056194

Title:
  Networking broken in early boot on Oracle Native instances

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2056194/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to