OK, thanks for the logs. Could you re-attach those running via sudo (or as root)? The default user on SLES does not have permissions to read the journal.
What I see so far looks like networking did not come up after cloud- init-local.service completes and writes out a network config. 2019-09-11 18:00:15,242 - stages.py[INFO]: Applying network configuration from ds bringup=False: {'ethernets': {'eth0': {'set-name': 'eth0', 'match': {'macaddress': u'00:0d:3a:6e:6f:8f'}, 'dhcp4': True}}, 'version': 2} This results in the following files being written: % cat test_azure_sles/etc/sysconfig/network/ifcfg-eth0 # Created by cloud-init on instance boot automatically, do not edit. # BOOTPROTO=dhcp DEVICE=eth0 HWADDR=00:0d:3a:6e:6f:8f NM_CONTROLLED=no ONBOOT=yes STARTMODE=auto TYPE=Ethernet USERCTL=no Upstream cloud-init on SLES does not generate/update /etc/resolv.conf but in the logs the cloud-init in does: 2019-09-11 18:00:15,246 - util.py[DEBUG]: Writing to /etc/sysconfig/network/ifcfg-eth0 - wb: [644] 191 bytes 2019-09-11 18:00:15,247 - util.py[DEBUG]: Reading from /etc/resolv.conf (quiet=False) 2019-09-11 18:00:15,247 - util.py[DEBUG]: Read 795 bytes from /etc/resolv.conf 2019-09-11 18:00:15,247 - util.py[DEBUG]: Writing to /etc/resolv.conf - wb: [644] 866 bytes At first, I thought maybe it was missing this commit: % git show b74ebca563a21332b29482c8029e7908f60225a4 commit b74ebca563a21332b29482c8029e7908f60225a4 Author: Robert Schweikert <rjsch...@suse.com> Date: Wed Jan 23 22:35:32 2019 +0000 net/sysconfig: do not write a resolv.conf file with only the header. Writing the file with no dns information may prevent distro tools from writing a resolv.conf file with dns information obtained from a dhcp server. diff --git a/cloudinit/net/sysconfig.py b/cloudinit/net/sysconfig.py index ae41f7b..fd8e501 100644 --- a/cloudinit/net/sysconfig.py +++ b/cloudinit/net/sysconfig.py @@ -557,6 +557,8 @@ class Renderer(renderer.Renderer): content.add_nameserver(nameserver) for searchdomain in network_state.dns_searchdomains: content.add_search_domain(searchdomain) + if not str(content): + return None header = _make_header(';') content_str = str(content) if not content_str.startswith(header): @@ -666,7 +668,8 @@ class Renderer(renderer.Renderer): dns_path = util.target_path(target, self.dns_path) resolv_content = self._render_dns(network_state, existing_dns_path=dns_path) - util.write_file(dns_path, resolv_content, file_mode) + if resolv_content: + util.write_file(dns_path, resolv_content, file_mode) if self.networkmanager_conf_path: nm_conf_path = util.target_path(target, self.networkmanager_conf_path) diff --git a/tests/unittests/test_net.py b/tests/unittests/test_net.py index d679e92..5313d2d 100644 --- a/tests/unittests/test_net.py +++ b/tests/unittests/test_net.py @@ -2098,6 +2098,10 @@ TYPE=Ethernet USERCTL=no """ self.assertEqual(expected, found[nspath + 'ifcfg-interface0']) + # The configuration has no nameserver information make sure we + # do not write the resolv.conf file + respath = '/etc/resolv.conf' + self.assertNotIn(respath, found.keys()) def test_config_with_explicit_loopback(self): ns = network_state.parse_net_config_data(CONFIG_V1_EXPLICIT_LOOPBACK) @@ -2456,6 +2460,10 @@ TYPE=Ethernet USERCTL=no """ self.assertEqual(expected, found[nspath + 'ifcfg-interface0']) + # The configuration has no nameserver information make sure we + # do not write the resolv.conf file + respath = '/etc/resolv.conf' + self.assertNotIn(respath, found.keys()) def test_config_with_explicit_loopback(self): ns = network_state.parse_net_config_data(CONFIG_V1_EXPLICIT_LOOPBACK) But, I believe that is in 19.1 (or likely patched in the distro version). Later in the boot, we can see that networking didn't actually come up as Azure datasource can't find a lease file and then goes into some sort of fallback mode which tries to bring up networking (it does) but not with dhcp which is why you're missing DNS (it's provided via option to the DHCP response. 2019-09-11 18:00:15,946 - azure.py[DEBUG]: Unable to find endpoint in dhclient logs. Falling back to check lease files 2019-09-11 18:00:15,946 - azure.py[DEBUG]: Looking for endpoint in lease file /var/lib/dhcp/dhclient.eth0.leases 2019-09-11 18:00:15,946 - handlers.py[DEBUG]: start: azure-ds/_get_value_from_leases_file: _get_value_from_leases_file 2019-09-11 18:00:15,946 - util.py[DEBUG]: Reading from /var/lib/dhcp/dhclient.eth0.leases (quiet=False) 2019-09-11 18:00:15,947 - azure.py[ERROR]: Failed to read /var/lib/dhcp/dhclient.eth0.leases: [Errno 2] No such file or directory: '/var/lib/dhcp/dhclient.eth0.leases' 2019-09-11 18:00:15,959 - handlers.py[DEBUG]: finish: azure-ds/_get_value_from_leases_file: SUCCESS: _get_value_from_leases_file 2019-09-11 18:00:15,959 - util.py[DEBUG]: Running command ['ifconfig'] with allowed return codes [0] (shell=False, capture=True) 2019-09-11 18:00:16,020 - azure.py[DEBUG]: ifconfig out: lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:2 errors:0 dropped:0 overruns:0 frame:0 TX packets:2 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:140 (140.0 b) TX bytes:140 (140.0 b) , err: 2019-09-11 18:00:16,020 - util.py[DEBUG]: Running command ['route', '-n'] with allowed return codes [0] (shell=False, capture=True) 2019-09-11 18:00:16,093 - azure.py[DEBUG]: route out: Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface , err: 2019-09-11 18:00:16,093 - util.py[DEBUG]: Running command ['ip', 'a'] with allowed return codes [0] (shell=False, capture=True) 2019-09-11 18:00:16,095 - azure.py[DEBUG]: ip out: 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 00:0d:3a:6e:6f:8f brd ff:ff:ff:ff:ff:ff , err: 2019-09-11 18:00:16,095 - util.py[DEBUG]: Running command ['ifup', 'eth0'] with allowed return codes [0] (shell=False, capture=True) 2019-09-11 18:00:31,824 - azure.py[DEBUG]: ifup out: eth0 up , err: 2019-09-11 18:00:31,824 - util.py[DEBUG]: Running command ['ip', '-o', 'route', 'list'] with allowed return codes [0] (shell=False, capture=True) 2019-09-11 18:00:31,827 - azure.py[DEBUG]: ip out: default via 10.0.0.1 dev eth0 proto dhcp 10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.17 168.63.129.16 via 10.0.0.1 dev eth0 proto dhcp 169.254.169.254 via 10.0.0.1 dev eth0 proto dhcp , err: 2019-09-11 18:00:31,828 - util.py[DEBUG]: Running command ['ifconfig'] with allowed return codes [0] (shell=False, capture=True) 2019-09-11 18:00:31,830 - azure.py[DEBUG]: ifconfig out: eth0 Link encap:Ethernet HWaddr 00:0D:3A:6E:6F:8F inet addr:10.0.0.17 Bcast:10.0.0.255 Mask:255.255.255.0 inet6 addr: fe80::20d:3aff:fe6e:6f8f/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:8 errors:0 dropped:0 overruns:0 frame:0 TX packets:29 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2709 (2.6 Kb) TX bytes:3373 (3.2 Kb) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:6 errors:0 dropped:0 overruns:0 frame:0 TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:468 (468.0 b) TX bytes:468 (468.0 b) , err: 2019-09-11 18:00:31,831 - util.py[DEBUG]: Running command ['route', '-n'] with allowed return codes [0] (shell=False, capture=True) 2019-09-11 18:00:31,834 - azure.py[DEBUG]: route out: Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.0.0.1 0.0.0.0 UG 0 0 0 eth0 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 168.63.129.16 10.0.0.1 255.255.255.255 UGH 0 0 0 eth0 169.254.169.254 10.0.0.1 255.255.255.255 UGH 0 0 0 eth0 , err: 2019-09-11 18:00:31,834 - util.py[DEBUG]: Running command ['ip', 'a'] with allowed return codes [0] (shell=False, capture=True) 2019-09-11 18:00:31,837 - azure.py[DEBUG]: ip out: 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:0d:3a:6e:6f:8f brd ff:ff:ff:ff:ff:ff inet 10.0.0.17/24 brd 10.0.0.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::20d:3aff:fe6e:6f8f/64 scope link valid_lft forever preferred_lft forever , err: 2019-09-11 18:00:31,837 - azure.py[WARNING]: No lease found; using default endpoint 2019-09-11 18:00:31,837 - azure.py[DEBUG]: Azure endpoint found at 168.63.129.16 2019-09-11 18:00:31,837 - handlers.py[DEBUG]: finish: azure-ds/find_endpoint: SUCCESS: find_endpoint So, I'd like to also the contents of: /etc/resolv.conf /etc/sysconfig/network/ifcfg-eth0 And running sudo cloud-init collect-logs so we can get the journal, which should reveal why SLES's networking service didn't come online before cloud-init.service started. ** Changed in: cloud-init Status: New => Incomplete ** Also affects: cloud-init (Suse) Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1843634 Title: cloud-init misconfigure the network on SLES Status in cloud-init: Incomplete Status in cloud-init package in Suse: New Bug description: I reproduced the issue on an Azure VM with SLES12 SP4 and cloud-init 19.1. The DNS is unreachable when cloud-init takes the responsibility of configuring the network. No nameservers or search domains are added to the /etc/resolv.conf as following: ; Created by cloud-init on instance boot automatically, do not edit. ; ### /etc/resolv.conf file autogenerated by netconfig! # # Before you change this file manually, consider to define the # static DNS configuration using the following variables in the # /etc/sysconfig/network/config file: # NETCONFIG_DNS_STATIC_SEARCHLIST # NETCONFIG_DNS_STATIC_SERVERS # NETCONFIG_DNS_FORWARDER # or disable DNS configuration updates via netconfig by setting: # NETCONFIG_DNS_POLICY='' # # See also the netconfig(8) manual page and other documentation. # # Note: Manual change of this file disables netconfig too, but # may get lost when this file contains comments or empty lines # only, the netconfig settings are same with settings in this # file and in case of a "netconfig update -f" call. # ### Please remove (at least) this line when you modify the file! I also attached the "/etc/sysconfig/network/config" in the first comment for your reference: When I disable the network configuration in cloud-init and leave it for netconfig, the /etc/resolv.conf is correctly populated with the search domain and the nameserver and the DNS is reachable. Here's the contents of the /etc/resolv.conf: ### /etc/resolv.conf file autogenerated by netconfig! # # Before you change this file manually, consider to define the # static DNS configuration using the following variables in the # /etc/sysconfig/network/config file: # NETCONFIG_DNS_STATIC_SEARCHLIST # NETCONFIG_DNS_STATIC_SERVERS # NETCONFIG_DNS_FORWARDER # or disable DNS configuration updates via netconfig by setting: # NETCONFIG_DNS_POLICY='' # # See also the netconfig(8) manual page and other documentation. # # Note: Manual change of this file disables netconfig too, but # may get lost when this file contains comments or empty lines # only, the netconfig settings are same with settings in this # file and in case of a "netconfig update -f" call. # ### Please remove (at least) this line when you modify the file! search xkf00b0rtzgejkug4xc2pcinre.xx.internal.cloudapp.net nameserver 168.63.129.16 When I tried to populate the network config dictionary that's built by DataSourceAzure with a default nameserver "168.63.129.16" and search domain "xkf00b0rtzgejkug4xc2pcinre.xx.internal.cloudapp.net", The DNS was reachable. But It's my understanding that cloud-init should be able to figure out this nameserver and the search domain the same way netconfig does. Another issue is the eth0 interface is not brought up automatically even though the contents of the file "/etc/sysconfig/network/ifcfg-eth0" seems correct # Created by cloud-init on instance boot automatically, do not edit. # BOOTPROTO=dhcp DEVICE=eth0 HWADDR=00:0d:3a:06:1e:04 NM_CONTROLLED=no ONBOOT=yes STARTMODE=auto TYPE=Ethernet USERCTL=no I had to bring it up by executing "ifup eth0" from Azure.py code after the network config is applied. This way I was able to ssh into the VM. Here's also the contents of "etc/udev/rules.d/85-persistent-net-cloud- init.rules":SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:0d:3a:6d:e4:53", NAME="eth0" To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1843634/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp