While we do have sporadic messages like this in our nginx error.log,
they started piling up around the time this issue was reported to us,
starting with this message:

2022/02/15 01:49:24 [error] 3341359#3341359: *1929977 upstream timed out
(110: Connection timed out) while reading response header from upstream,
client: 10.229.95.139, server: , request: "POST
/MAAS/metadata/status/ww4mgk HTTP/1.1", upstream:
"http://10.155.212.2:5240/MAAS/metadata/status/ww4mgk";, host:
"10.229.32.21:5248"

Around this time we started seeing these pile up in rackd.log:
2022-02-15 01:40:07 provisioningserver.rpc.clusterservice: [critical] Failed to 
contact region. (While requesting RPC info at http://localhost:5240/MAAS).

Our regiond processes are running, and I don't see anything that seems
abnormal in the regiond log around this time. However, these symptoms
reminded me of a similar issue in bug 1908452, so I started debugging it
similarly. Like bug 1908452, I see one regiond process stuck in a recv
call:

root@maas:/var/snap/maas/common/log# strace -p 3340720
strace: Process 3340720 attached
recvfrom(23, 

All the other regiond processes are making progress, but not this one.

The server it is talking to appears to be this canonical server, which I
can't currently resolve:

root@maas:/var/snap/maas/common/log# lsof -i -a -p  3340720 | grep 23
python3 3340720 root   23u  IPv4 3487880288      0t0  TCP 
maas:42848->https-services.aerodent.canonical.com:http (ESTABLISHED)
root@maas:/var/snap/maas/common/log# host https-services.aerodent.canonical.com
Host https-services.aerodent.canonical.com not found: 3(NXDOMAIN)

However, I suspect it maybe related to image fetching again. In our
regiond logs, I see that the the last log entry related to images
appears to have been about an hour before things locked up:

root@maas:/var/snap/maas/common/log# grep image regiond.log | tail -1
2022-02-15 00:38:51 regiond: [info] 127.0.0.1 GET 
/MAAS/images-stream/streams/v1/maas:v2:download.json HTTP/1.1 --> 200 OK 
(referrer: -; agent: python-simplestreams/0.1)

Prior to that, we have log entries every hour, but none after. So maybe
simplestreams has other places that need a timeout?

** Changed in: cloud-init
       Status: New => Invalid

** Also affects: simplestreams
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1960944

Title:
  cloudinit.sources.DataSourceNotFoundException: Did not find any data
  source, searched classes

Status in cloud-init:
  Invalid
Status in MAAS:
  New
Status in simplestreams:
  New

Bug description:
  Not able to deploy baremetal (arm64 and amd64) on a 
  snap-based MAAS: 3.1.0 (maas      3.1.0-10901-g.f1f8f1505  18199  3.1/stable) 

  from MAAS event log: 
  ```
  Tue, 15 Feb. 2022 17:35:33    Node changed status - From 'Deploying' to 
'Failed deployment'
   Tue, 15 Feb. 2022 17:35:33   Marking node failed - Node operation 
'Deploying' timed out after 30 minutes.
   Tue, 15 Feb. 2022 17:07:44   Node installation - 'cloudinit' searching for 
network data from DataSourceMAAS
   Tue, 15 Feb. 2022 17:06:44   Node installation - 'cloudinit' attempting to 
read from cache [trust]
   Tue, 15 Feb. 2022 17:06:42   Node installation - 'cloudinit' attempting to 
read from cache [check]
   Tue, 15 Feb. 2022 17:05:29   Performing PXE boot
   Tue, 15 Feb. 2022 17:05:29   PXE Request - installation
   Tue, 15 Feb. 2022 17:03:52   Node powered on
  ```

  
  Server console log shows: 

  ```
  ubuntu login:          Starting Message of the Day...
  [  OK  ] Listening on Socket unix for snap application lxd.daemon.
           Starting Service for snap application lxd.activate...
  [  OK  ] Finished Service for snap application lxd.activate.
  [  OK  ] Started snap.lxd.hook.conf…-4400-96a8-0c5c9e438c51.scope.
           Starting Time & Date Service...
  [  OK  ] Started Time & Date Service.
  [  OK  ] Finished Wait until snapd is fully seeded.
           Starting Apply the settings specified in cloud-config...
  [  OK  ] Reached target Multi-User System.
  [  OK  ] Reached target Graphical Interface.
           Starting Update UTMP about System Runlevel Changes...
  [  OK  ] Finished Update UTMP about System Runlevel Changes.
  [  322.036861] cloud-init[2034]: Can not apply stage config, no datasource 
found! Likely bad things to come!
  [  322.037477] cloud-init[2034]: 
------------------------------------------------------------
  [  322.037907] cloud-init[2034]: Traceback (most recent call last):
  [  322.038341] cloud-init[2034]:   File 
"/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 521, in 
main_modules
  [  322.038783] cloud-init[2034]:     init.fetch(existing="trust")
  [  322.039181] cloud-init[2034]:   File 
"/usr/lib/python3/dist-packages/cloudinit/stages.py", line 411, in fetch
  [  322.039584] cloud-init[2034]:     return 
self._get_data_source(existing=existing)
  [  322.040031] cloud-init[2034]:   File 
"/usr/lib/python3/dist-packages/cloudinit/stages.py", line 311, in 
_get_data_source
  [  322.040381] cloud-init[2034]:     (ds, dsname) = 
sources.find_source(self.cfg,
  [  322.040807] cloud-init[2034]:   File 
"/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 854, in 
find_source
  [  322.041221] cloud-init[2034]:     raise DataSourceNotFoundException(msg)
  [  322.041659] cloud-init[2034]: 
cloudinit.sources.DataSourceNotFoundException: Did not find any data source, 
searched classes: ()
  [  322.042076] cloud-init[2034]: 
------------------------------------------------------------
  [FAILED] Failed to start Apply the …ngs specified in cloud-config.
  See 'systemctl status cloud-config.service' for details.
           Starting Execute cloud user/final scripts...
  [  324.029369] cloud-init[2042]: Can not apply stage final, no datasource 
found! Likely bad things to come!
  [  324.029979] cloud-init[2042]: 
------------------------------------------------------------
  [  324.030420] cloud-init[2042]: Traceback (most recent call last):
  [  324.030835] cloud-init[2042]:   File 
"/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 521, in 
main_modules
  [  324.031258] cloud-init[2042]:     init.fetch(existing="trust")
  [  324.031647] cloud-init[2042]:   File 
"/usr/lib/python3/dist-packages/cloudinit/stages.py", line 411, in fetch
  [  324.032111] cloud-init[2042]:     return 
self._get_data_source(existing=existing)
  [  324.032483] cloud-init[2042]:   File 
"/usr/lib/python3/dist-packages/cloudinit/stages.py", line 311, in 
_get_data_source
  [  324.032865] cloud-init[2042]:     (ds, dsname) = 
sources.find_source(self.cfg,
  [  324.033292] cloud-init[2042]:   File 
"/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 854, in 
find_source
  [  324.033690] cloud-init[2042]:     raise DataSourceNotFoundException(msg)
  [  324.034112] cloud-init[2042]: 
cloudinit.sources.DataSourceNotFoundException: Did not find any data source, 
searched classes: ()
  [  324.034559] cloud-init[2042]: 
------------------------------------------------------------
  [FAILED] Failed to start Execute cloud user/final scripts.
  See 'systemctl status cloud-final.service' for details.
  [  OK  ] Reached target Cloud-init target.
  [  OK  ] Finished Message of the Day.
  ```

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1960944/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to