So, the root cause is completely clear: dbus.socket starts early, then
cloud-init starts which blocks the entire basic.target (early boot) on
network operations, thus dbus.service cannot start. nss-resolve already
sees dbus.socket and thus can connect (instead of failing fast), and
then gets the 25s D-Bus timeout as D-Bus is blocked.

- Moving dbus.service into early boot would be a bold step, and I don't
think such a large change is appropriate two weeks before release.

 - Rearranging nsswitch.conf and modify it on the fly also sounds like a
big no.

 - I'd also not like to generally move dbus.socket into late boot, as
that would break other services during early boot which queue up a
connection to D-Bus.

 - So far the cleanest way out of this would be to also make dbus.socket
wait for cloud-init.service, as that already blocks dbus.service. I
verified that name resolution is then fast again. This also  doesn't
cause dependency loops as both cloud-init.service and sockets.target run
in early boot.

Could you try adding "Before=dbus.socket" to /lib/systemd/system/cloud-
init.service and confirm that this helps? (Does for me in a container,
but I don't have access to GCE or EC2)

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1629797

Title:
  resolve service in nsswitch.conf adds 25 seconds to failed lookups
  before systemd-resolved is up

Status in cloud-init package in Ubuntu:
  Invalid
Status in systemd package in Ubuntu:
  Triaged

Bug description:
  During boot, cloud-init does DNS resolution checks to if particular
  metadata services are available (in order to determine which cloud it
  is running on).  These checks happen before systemd-resolved is up[0]
  and if they resolve unsuccessfully they take 25 seconds to complete.

  This has substantial impact on boot time in all contexts, because
  cloud-init attempts to resolve three known-invalid addresses ("does-
  not-exist.example.com.", "example.invalid." and a random string) to
  enable it to detect when it's running in an environment where a DNS
  server will always return some sort of redirect.  As such, we're
  talking a minimum impact of 75 seconds in all environments.  This
  increases when cloud-init is configured to check for multiple
  environments.

  This means that yakkety is consistently taking 2-3 minutes to boot on
  EC2 and GCE, compared to the ~30 seconds of the first boot and ~10
  seconds thereafter in xenial.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1629797/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to