> It is definitely not Hetzner's task to fix Ubuntu.

To be clear, cloud-init is not used only on Ubuntu; I believe that
Hetzner's outage would have this effect across the majority of Linux
distributions.

And, that aside, I don't think this characterisation is fair: we're
suggesting that if Hetzner are going to allow their internal services to
go down, then they should provide a more reliable way for instances to
determine their identity.  (This is generally done via DMI in other
clouds that do it.  The hypervisor stores the instance ID and provides
it as a DMI value, and obviously instances can only boot if the
hypervisor is up; therefore, the instance ID is always available.)  To
state this more glibly (and therefore less helpfully): it is not cloud-
init's task to fix Hetzner.

That said, perhaps there is something that the Hetzner data source could
do to handle this Hetzner-specific case.  We already perform 60 retries
with a 2 second wait between them, and a 2 second timeout.  So we allow
at least 2 minutes for the services to respond with something before we
give up; we could bump that but I don't think it addresses the
underlying issue.  Any thoughts would be appreciated.

Alternatively (or perhaps additionally), this may need a change in the
instance ID model that cloud-init uses to handle an explicit "we are not
currently able to determine instance ID, so assume it hasn't changed".
I think, however, that this would lead to a converse problem: instances
launched from instance-capture images which boot for the first time
during an "instance ID outage" would not detect that they were new
instances, and so would not perform their first boot customisation.
This would result in potentially-inaccessible instances (if any
credentials remaining in the image are not available to the user
launching instances) with SSH host keys not rotated (meaning that they
would all have the same host keys as the image; a security issue).  Of
course, if users are also relying on their cloud-init user-data to
perform any actions, that also won't occur; depending on their threat
model, some users might also consider this a security issue.

The ultimate problem is that cloud-init cannot determine when it runs
within an instance whether or not this is a "first boot": the cloud
needs to indicate to us one way or the other, which is done via instance
ID.  If the cloud cannot do that, then there is no way to determine the
correct behaviour.

If you are certain that you will never be capturing instances as images
(i.e. you can categorically say that the root filesystem in this
instance will _never_ first boot again) and you aren't using any of
cloud-init's functionality after first boot (e.g. per-boot scripts),
then you can disable cloud-init in the ways described by Scott earlier
in this bug.

One convenience we could potentially provide: if cloud-init had a way
for image creators to express "when next launched, cloud-init should
treat that instance ID as immutable and permanent" (in a way that could
be undone on subsequent boots, if a user wants to "unfreeze" an instance
for image capture) then we might be able to avoid some of this pain, but
that idea would need more fleshing out before it's clear if it even
makes sense.

> Especially since that process of re-initiialization of that instance
ID is neither obvious nor documented.

Agreed on both counts; cloud-init's documentation is lacking in many
respects, including this one.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1885527

Title:
  cloud-init regenerating ssh-keys

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1885527/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to