*** This bug is a duplicate of bug 1821102 ***
    https://bugs.launchpad.net/bugs/1821102

The logs do contain that bug, but I'm not sure that's the failure here.

Looking at the cloud-init logs, we can see the Ephemeral DHCP start and
obtain an lease, but the end point does not respond:

2019-10-30 13:28:41,516 - dhcp.py[DEBUG]: Received dhcp lease on eth0 for 
136.156.90.74/255.255.254.0
2019-10-30 13:28:41,516 - __init__.py[DEBUG]: Attempting setup of ephemeral 
network on eth0 with 136.156.90.74/23 brd 136.156.91.255
2019-10-30 13:28:41,516 - util.py[DEBUG]: Running command ['ip', '-family', 
'inet', 'addr', 'add', '136.156.90.74/23', 'broadcast', '136.156.91.255', 
'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2019-10-30 13:28:41,519 - util.py[DEBUG]: Running command ['ip', '-family', 
'inet', 'link', 'set', 'dev', 'eth0', 'up'] with allowed return codes [0] 
(shell=False, capture=True)
2019-10-30 13:28:41,522 - util.py[DEBUG]: Running command ['ip', 'route', 
'show', '0.0.0.0/0'] with allowed return codes [0] (shell=False, capture=True)
2019-10-30 13:28:41,525 - util.py[DEBUG]: Running command ['ip', '-4', 'route', 
'add', '136.156.91.254', 'dev', 'eth0', 'src', '136.156.90.74'] with allowed 
return codes [0] (shell=False, capture=True)
2019-10-30 13:28:41,527 - util.py[DEBUG]: Running command ['ip', '-4', 'route', 
'add', 'default', 'via', '136.156.91.254', 'dev', 'eth0'] with allowed return 
codes [0] (shell=False, capture=True)
2019-10-30 13:29:21,573 - util.py[DEBUG]: Resolving URL: http://169.254.169.254 
took 40.044 seconds
2019-10-30 13:29:21,574 - url_helper.py[DEBUG]: [0/1] open 
'http://169.254.169.254/openstack' with {'url': 
'http://169.254.169.254/openstack', 'headers': {'User-Agent': 
'Cloud-Init/18.5'}, 'allow_redirects': True, 'method': 'GET', 'timeout': 10.0} 
configuration
2019-10-30 13:29:31,608 - url_helper.py[DEBUG]: Calling 
'http://169.254.169.254/openstack' failed [10/-1s]: request error 
[HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with 
url: /openstack (Caused by 
ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection object 
at 0x7fdc5e5dc210>, 'Connection to 169.254.169.254 timed out. (connect 
timeout=10.0)'))]
2019-10-30 13:29:31,608 - DataSourceOpenStack.py[DEBUG]: Giving up on OpenStack 
md from ['http://169.254.169.254/openstack'] after 10 seconds
2019-10-30 13:29:31,609 - util.py[DEBUG]: Crawl of metadata service took 50.079 
seconds

Cloud-init did not find the OpenStack datasource at local time:

2019-10-30 13:29:31,689 - main.py[DEBUG]: No local datasource found

So we write out a fallback network config (dhcp on eth0).

2019-10-30 13:29:31,722 - stages.py[INFO]: Applying network configuration from 
fallback bringup=False: {'version': 1, 'config': [{'subnets': [{'type': 
'dhcp'}], 'type': 'physical', 'name': 'eth0', 'mac_address': 
'fa:16:3e:f4:b5:1d'}]}
2019-10-30 13:29:31,723 - __init__.py[DEBUG]: Selected renderer 'sysconfig' 
from priority list: None
2019-10-30 13:29:31,725 - util.py[DEBUG]: Writing to 
/etc/sysconfig/network-scripts/ifcfg-eth0 - wb: [644] 159 bytes
2019-10-30 13:29:31,726 - util.py[DEBUG]: Restoring selinux mode for 
/etc/sysconfig/network-scripts/ifcfg-eth0 (recursive=False)
2019-10-30 13:29:31,726 - util.py[DEBUG]: Restoring selinux mode for 
/etc/sysconfig/network-scripts/ifcfg-eth0 (recursive=False)

And then exit cloud-init-local.service:

2019-10-30 13:29:31,730 - util.py[DEBUG]: cloud-init mode 'init' took
52.389 seconds (52.39)

Now, the OS networking layer comes up:

Oct 30 13:29:31.623535 localhost.localdomain cloud-init[732]: 2019-10-30 
13:29:31,619 - util.py[WARNING]: No active metadata service found
Oct 30 13:29:31.751295 localhost.localdomain systemd[1]: Started Initial 
cloud-init job (pre-networking).
Oct 30 13:29:31.754472 localhost.localdomain systemd[1]: Reached target Network 
(Pre).
Oct 30 13:29:31.756812 localhost.localdomain systemd[1]: Starting LSB: Bring 
up/down networking...
Oct 30 13:29:31.961050 localhost.localdomain network[866]: Bringing up loopback 
interface:  [  OK  ]
Oct 30 13:29:31.990878 localhost.localdomain network[866]: Bringing up 
interface eth0:
Oct 30 13:29:32.034411 localhost.localdomain dhclient[995]: DHCPDISCOVER on 
eth0 to 255.255.255.255 port 67 interval 8 (xid=0xeceb89e)
Oct 30 13:29:32.055157 localhost.localdomain dhclient[995]: DHCPREQUEST on eth0 
to 255.255.255.255 port 67 (xid=0xeceb89e)
Oct 30 13:29:32.055188 localhost.localdomain dhclient[995]: DHCPOFFER from 
136.156.90.11
Oct 30 13:29:32.055796 localhost.localdomain dhclient[995]: DHCPACK from 
136.156.90.11 (xid=0xeceb89e)
Oct 30 13:29:34.132189 localhost.localdomain NET[1054]: 
/usr/sbin/dhclient-script : updated /etc/resolv.conf
Oct 30 13:29:34.166284 host-136-156-90-74 dhclient[995]: bound to 136.156.90.74 
-- renewal in 34571 seconds.
Oct 30 13:29:34.167466 host-136-156-90-74 network[866]: Determining IP 
information for eth0... done.
Oct 30 13:29:34.224637 host-136-156-90-74 network[866]: [  OK  ]
Oct 30 13:29:34.245397 host-136-156-90-74 systemd[1]: Started LSB: Bring 
up/down networking.
Oct 30 13:29:34.249783 host-136-156-90-74 systemd[1]: Reached target Network.

And cloud-init.service starts up, and this time it is able to connect to
the IMDS:

2019-10-30 13:29:34,638 - __init__.py[DEBUG]: Seeing if we can get any data 
from <class 'cloudinit.sources.DataSourceOpenStack.DataSourceOpenStack'>
2019-10-30 13:29:34,638 - __init__.py[DEBUG]: Update datasource metadata and 
network config due to events: New instance first boot
2019-10-30 13:29:34,638 - util.py[DEBUG]: Running command 
['systemd-detect-virt', '--quiet', '--container'] with allowed return codes [0] 
(shell=False, capture=True)
2019-10-30 13:29:34,641 - util.py[DEBUG]: Running command 
['running-in-container'] with allowed return codes [0] (shell=False, 
capture=True)
2019-10-30 13:29:34,644 - util.py[DEBUG]: Running command ['lxc-is-container'] 
with allowed return codes [0] (shell=False, capture=True)
2019-10-30 13:29:34,648 - util.py[DEBUG]: Reading from /proc/1/environ 
(quiet=False)
2019-10-30 13:29:34,648 - util.py[DEBUG]: Read 0 bytes from /proc/1/environ
2019-10-30 13:29:34,648 - util.py[DEBUG]: Reading from /proc/self/status 
(quiet=False)
2019-10-30 13:29:34,648 - util.py[DEBUG]: Read 1206 bytes from /proc/self/status
2019-10-30 13:29:34,648 - util.py[DEBUG]: querying dmi data 
/sys/class/dmi/id/product_name
2019-10-30 13:29:34,648 - util.py[DEBUG]: Reading from 
/sys/class/dmi/id/product_name (quiet=False)
2019-10-30 13:29:34,648 - util.py[DEBUG]: Read 18 bytes from 
/sys/class/dmi/id/product_name
2019-10-30 13:29:34,648 - util.py[DEBUG]: dmi data 
/sys/class/dmi/id/product_name returned OpenStack Compute
2019-10-30 13:29:34,654 - util.py[DEBUG]: Resolving URL: http://169.254.169.254 
took 0.006 seconds
2019-10-30 13:29:34,654 - url_helper.py[DEBUG]: [0/1] open 
'http://169.254.169.254/openstack' with {'url': 
'http://169.254.169.254/openstack', 'headers': {'User-Agent': 
'Cloud-Init/18.5'}, 'allow_redirects': True, 'method': 'GET', 'timeout': 10.0} 
configuration
2019-10-30 13:29:36,359 - url_helper.py[DEBUG]: Read from 
http://169.254.169.254/openstack (200, 94b) after 1 attempts
...
2019-10-30 13:29:37,717 - util.py[DEBUG]: Crawl of openstack metadata service 
took 1.358 seconds
2019-10-30 13:29:37,718 - __init__.py[WARNING]: Error persisting 
instance-data.json: 'utf8' codec can't decode byte 0xd5 in position 1: invalid 
continuation byte
2019-10-30 13:29:37,718 - handlers.py[DEBUG]: finish: 
init-network/search-OpenStack: SUCCESS: found network data from 
DataSourceOpenStack
2019-10-30 13:29:37,718 - stages.py[INFO]: Loaded datasource 
DataSourceOpenStack - DataSourceOpenStack [net,ver=2]

So we know we're OpenStack, and as Scott pointed out, it suffers from
the py27 json bug.

For this bug, the warning is not a False positive, rather, there seems
to be an issue with connecting to the metadata service early in boot.  I
suspect we'd need some debugging enabled to see why we can't connect to
the endpoint after a successful dhclient.

One interesting bit would be to compare the leases file from cloud-init
Ephemeral DHCP, with the system dhcp.  Another thought is if the image
has any custom dhclient hooks that setup additional route information
needed to hit the metadata service.

We've seen that bug before,

https://bugs.launchpad.net/cloud-init/+bug/1821102

Which is not in 18.5, so possibly a dupe of that issue.



** This bug has been marked a duplicate of bug 1821102
   Cloud-init should not setup ephemeral ipv4 if apply_network_config is False 
for OpenStack

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1850642

Title:
  Cloud init unable to find the metadata service but can CURL it

Status in cloud-init:
  New

Bug description:
  In a tripleo Rocky deployment I am seeing the behaviour bellow.

  In the bootup logs I see that the metadata service could not be
  reached.

    File 
"/usr/lib/python2.7/site-packages/cloudinit/sources/DataSourceOpenStack.py", 
line 177, in _crawl_metadata
      'No active metadata service found')

  
  However the service is curlable from inside the VM and I am also seeing some 
requests in the metadata-service in Openstack. Furthemore I am getting SSH keys 
inserted so I believe this might be a false warning.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1850642/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to