While we do have sporadic messages like this in our nginx error.log, they started piling up around the time this issue was reported to us, starting with this message:
2022/02/15 01:49:24 [error] 3341359#3341359: *1929977 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.229.95.139, server: , request: "POST /MAAS/metadata/status/ww4mgk HTTP/1.1", upstream: "http://10.155.212.2:5240/MAAS/metadata/status/ww4mgk", host: "10.229.32.21:5248" Around this time we started seeing these pile up in rackd.log: 2022-02-15 01:40:07 provisioningserver.rpc.clusterservice: [critical] Failed to contact region. (While requesting RPC info at http://localhost:5240/MAAS). Our regiond processes are running, and I don't see anything that seems abnormal in the regiond log around this time. However, these symptoms reminded me of a similar issue in bug 1908452, so I started debugging it similarly. Like bug 1908452, I see one regiond process stuck in a recv call: root@maas:/var/snap/maas/common/log# strace -p 3340720 strace: Process 3340720 attached recvfrom(23, All the other regiond processes are making progress, but not this one. The server it is talking to appears to be this canonical server, which I can't currently resolve: root@maas:/var/snap/maas/common/log# lsof -i -a -p 3340720 | grep 23 python3 3340720 root 23u IPv4 3487880288 0t0 TCP maas:42848->https-services.aerodent.canonical.com:http (ESTABLISHED) root@maas:/var/snap/maas/common/log# host https-services.aerodent.canonical.com Host https-services.aerodent.canonical.com not found: 3(NXDOMAIN) However, I suspect it maybe related to image fetching again. In our regiond logs, I see that the the last log entry related to images appears to have been about an hour before things locked up: root@maas:/var/snap/maas/common/log# grep image regiond.log | tail -1 2022-02-15 00:38:51 regiond: [info] 127.0.0.1 GET /MAAS/images-stream/streams/v1/maas:v2:download.json HTTP/1.1 --> 200 OK (referrer: -; agent: python-simplestreams/0.1) Prior to that, we have log entries every hour, but none after. So maybe simplestreams has other places that need a timeout? ** Changed in: cloud-init Status: New => Invalid ** Also affects: simplestreams Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1960944 Title: cloudinit.sources.DataSourceNotFoundException: Did not find any data source, searched classes Status in cloud-init: Invalid Status in MAAS: New Status in simplestreams: New Bug description: Not able to deploy baremetal (arm64 and amd64) on a snap-based MAAS: 3.1.0 (maas 3.1.0-10901-g.f1f8f1505 18199 3.1/stable) from MAAS event log: ``` Tue, 15 Feb. 2022 17:35:33 Node changed status - From 'Deploying' to 'Failed deployment' Tue, 15 Feb. 2022 17:35:33 Marking node failed - Node operation 'Deploying' timed out after 30 minutes. Tue, 15 Feb. 2022 17:07:44 Node installation - 'cloudinit' searching for network data from DataSourceMAAS Tue, 15 Feb. 2022 17:06:44 Node installation - 'cloudinit' attempting to read from cache [trust] Tue, 15 Feb. 2022 17:06:42 Node installation - 'cloudinit' attempting to read from cache [check] Tue, 15 Feb. 2022 17:05:29 Performing PXE boot Tue, 15 Feb. 2022 17:05:29 PXE Request - installation Tue, 15 Feb. 2022 17:03:52 Node powered on ``` Server console log shows: ``` ubuntu login: Starting Message of the Day... [ OK ] Listening on Socket unix for snap application lxd.daemon. Starting Service for snap application lxd.activate... [ OK ] Finished Service for snap application lxd.activate. [ OK ] Started snap.lxd.hook.conf…-4400-96a8-0c5c9e438c51.scope. Starting Time & Date Service... [ OK ] Started Time & Date Service. [ OK ] Finished Wait until snapd is fully seeded. Starting Apply the settings specified in cloud-config... [ OK ] Reached target Multi-User System. [ OK ] Reached target Graphical Interface. Starting Update UTMP about System Runlevel Changes... [ OK ] Finished Update UTMP about System Runlevel Changes. [ 322.036861] cloud-init[2034]: Can not apply stage config, no datasource found! Likely bad things to come! [ 322.037477] cloud-init[2034]: ------------------------------------------------------------ [ 322.037907] cloud-init[2034]: Traceback (most recent call last): [ 322.038341] cloud-init[2034]: File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 521, in main_modules [ 322.038783] cloud-init[2034]: init.fetch(existing="trust") [ 322.039181] cloud-init[2034]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 411, in fetch [ 322.039584] cloud-init[2034]: return self._get_data_source(existing=existing) [ 322.040031] cloud-init[2034]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 311, in _get_data_source [ 322.040381] cloud-init[2034]: (ds, dsname) = sources.find_source(self.cfg, [ 322.040807] cloud-init[2034]: File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 854, in find_source [ 322.041221] cloud-init[2034]: raise DataSourceNotFoundException(msg) [ 322.041659] cloud-init[2034]: cloudinit.sources.DataSourceNotFoundException: Did not find any data source, searched classes: () [ 322.042076] cloud-init[2034]: ------------------------------------------------------------ [FAILED] Failed to start Apply the …ngs specified in cloud-config. See 'systemctl status cloud-config.service' for details. Starting Execute cloud user/final scripts... [ 324.029369] cloud-init[2042]: Can not apply stage final, no datasource found! Likely bad things to come! [ 324.029979] cloud-init[2042]: ------------------------------------------------------------ [ 324.030420] cloud-init[2042]: Traceback (most recent call last): [ 324.030835] cloud-init[2042]: File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 521, in main_modules [ 324.031258] cloud-init[2042]: init.fetch(existing="trust") [ 324.031647] cloud-init[2042]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 411, in fetch [ 324.032111] cloud-init[2042]: return self._get_data_source(existing=existing) [ 324.032483] cloud-init[2042]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 311, in _get_data_source [ 324.032865] cloud-init[2042]: (ds, dsname) = sources.find_source(self.cfg, [ 324.033292] cloud-init[2042]: File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 854, in find_source [ 324.033690] cloud-init[2042]: raise DataSourceNotFoundException(msg) [ 324.034112] cloud-init[2042]: cloudinit.sources.DataSourceNotFoundException: Did not find any data source, searched classes: () [ 324.034559] cloud-init[2042]: ------------------------------------------------------------ [FAILED] Failed to start Execute cloud user/final scripts. See 'systemctl status cloud-final.service' for details. [ OK ] Reached target Cloud-init target. [ OK ] Finished Message of the Day. ``` To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1960944/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp