You have been subscribed to a public bug:

# Our problem #

During a regular update of our container environment, `systemd` (and the
related packages libpam-systemd, libsystemd0, libudev1, systemd-sysv and
udev) were updated from `249.11-0ubuntu3.6` to `249.11-0ubuntu3.7`.
We're talking only about Ubuntu 22.04. Our Ubuntu 20.04 is working fine
with `systemctl daemon-reexec`.

In my opinion, the update was not the problem because we've tried
downgrading and tried these versions: (current) `249.11-0ubuntu3.7`,
`249.11-0ubuntu3.6`, `249.11-0ubuntu3.4` and `249.11-0ubuntu3.3`. The
symptoms were the same.

# Symptoms #

The `/var/lib/dpkg/info/systemd.postinst` executes a `systemctl daemon-
reexec` and that ended in a disaster. It seems that `systemd` is
forgetting all it started children and tries to start nearly every
configured service again. Naturally, the old services are still running,
and the ports can't be opened twice and `systemd` won't give up. Here
are some(!) of the logfiles:

Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Starting Create Volatile Files and 
Directories...
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: systemd-udevd.service: Found 
left-over process 130 (systemd-udevd) in control group while starting unit. 
Ignoring.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: systemd-udevd.service: Found 
left-over process 31475 (systemd-udevd) in control group while starting unit. 
Ignoring.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: systemd-udevd.service: Found 
left-over process 31476 (systemd-udevd) in control group while starting unit. 
Ignoring.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.

And...

Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Reached target System Initialization.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily apt download activities.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily apt upgrade and clean 
activities.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily dpkg database backup 
timer.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Periodic ext4 Online Metadata 
Check for All Filesystems.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Condition check resulted in Discard 
unused blocks once a week being skipped.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily rotation of log files.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily man-db regeneration.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Message of the Day.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Clean PHP session files every 
30 mins.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Update the plocate database 
daily.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily Cleanup of Temporary 
Directories.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Reached target Basic System.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: System is tainted: cgroupsv1
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Reached target Timer Units.

And...

Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Found left-over process 
206 (atd) in control group while starting unit. Ignoring.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Starting Deferred execution 
scheduler...
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: cron.service: Found left-over process 
164 (cron) in control group while starting unit. Ignoring.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Regular background program 
processing daemon.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: dbus.service: Found left-over process 
177 (dbus-daemon) in control group while starting unit. Ignoring.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started D-Bus System Message Bus.

And...

Mar 31 12:51:39 FQDN_REDACTED systemd[1]: rsyslog.service: Found left-over 
process 204 (rsyslogd) in control group while starting unit. Ignoring.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Failed with result 
'exit-code'.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Unit process 206 (atd) 
remains running after unit stopped.

Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over 
process 382 (apache2) in control group while starting unit. Ignoring.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over 
process 392 (apache2) in control group while starting unit. Ignoring.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over 
process 397 (apache2) in control group while starting unit. Ignoring.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over 
process 3052 (apache2) in control group while starting unit. Ignoring.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Starting The Apache HTTP Server...
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Stopped Deferred execution scheduler.
Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Found left-over process 
206 (atd) in control group while starting unit. Ignoring.

And...

Mar 31 12:51:40 FQDN_REDACTED sshd[31772]: error: Bind to port 22 on
0.0.0.0 failed: Address already in use.

And...

Mar 31 12:52:06 FQDN_REDACTED systemd[1]: Started The Salt Minion.
Mar 31 12:52:06 FQDN_REDACTED salt-minion[32339]: The Salt Minion is shutdown.
Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Main process 
exited, code=exited, status=1/FAILURE
Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Failed with 
result 'exit-code'.
Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Unit process 
2808 (/opt/saltstack/) remains running after unit stopped.
Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Unit process 
2848 (/opt/saltstack/) remains running after unit stopped.

Other internal `systemd` process were started again:

root           1  0.0  0.1 101204 12444 ?        Ss   10:19   0:03 
/lib/systemd/systemd -z        --system --deserialize 16
root          75  0.0  0.1  31440 13484 ?        Ss   10:19   0:00 
/lib/systemd/systemd-journald
systemd+     159  0.0  0.0  16124  8004 ?        Ss   10:19   0:00 
/lib/systemd/systemd-networkd
message+     177  0.0  0.0   8252  4440 ?        Ss   10:19   0:00 @dbus-daemon 
--system --address=systemd: --nofork --nopidfile --systemd-activation 
--syslog-only
root         205  0.0  0.0  14908  6464 ?        Ss   10:19   0:00 
/lib/systemd/systemd-logind
systemd+     223  0.0  0.1  25268 12592 ?        Ss   10:19   0:00 
/lib/systemd/systemd-resolved
root       31424  0.0  0.1  31424 13636 ?        Ss   12:51   0:00 
/lib/systemd/systemd-journald
systemd+   31636  0.0  0.0  16124  6588 ?        Ss   12:51   0:00 
/lib/systemd/systemd-networkd
message+   31639  0.0  0.0   8124  3804 ?        Ss   12:51   0:00 @dbus-daemon 
--system --address=systemd: --nofork --nopidfile --systemd-activation 
--syslog-only
root       31682  0.0  0.0  14908  6480 ?        Ss   12:51   0:00 
/lib/systemd/systemd-logind
systemd+   31686  0.0  0.1  25268 12580 ?        Ss   12:51   0:00 
/lib/systemd/systemd-resolved
root       32087  0.0  0.0  21436  5252 ?        Ss   12:51   0:00 
/lib/systemd/systemd-udevd

You can either kill all the old processes and restart them, and then
everything is fine. Or you can reboot the container. Besides that
`systemctl daemon-reexec` the `systemd` version is running fine.
`systemctl daemon-reload` is working like a charme.

# Normal case #

In the normal case a `systemctl daemon-reexec` just prints only a few
lines:

Mar 31 14:21:58 FQDN_REDACTED systemd[1]: Reexecuting.
Mar 31 14:21:58 FQDN_REDACTED systemd[1]: systemd 249.11-0ubuntu3.7 running in 
system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT 
+GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD 
+LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ 
+ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Mar 31 14:21:58 FQDN_REDACTED systemd[1]: Detected architecture x86-64.

# Testcase #

Doing a `systemctl daemon-restart` and `ssh localhost` shows the
problem. `systemctl` removes the directory `/run/sshd` during the reexec
and `ssh` will refuse further connects because the directory is missing.

$ systemctl daemon-restart
$ ssh root@localhost
kex_exchange_identification: read: Connection reset by peer
Connection reset by 127.0.0.1 port 22
$

Killing the old instance of SSH and restarting it will work.

# Some details to the hardware #

Our metal runs OpenVZ/Virtuozzo with this kernel (without any problems):

> Linux FQDN_REDACTED 3.10.0-1127.18.2.vz7.163.46 #1 SMP Fri Nov 20
21:47:55 MSK 2020 x86_64 x86_64 x86_64 GNU/Linux

The container with the `systemctl daemon-reexec` problem reports the
following kernel:

Linux FQDN_REDACTED 5.4.0 #1 SMP Thu Apr 22 16:18:59 MSK 2021 x86_64
x86_64 x86_64 GNU/Linux

# Upshot #

* Can somebody help me with this issue?
* Why is `systemctl` losing its internal state about the running 
processes/services?
* Why is `systemctl` restarting everything?

** Affects: systemd (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: container systemd
-- 
systemctl daemon-reexec forgets running services and starts everything new
https://bugs.launchpad.net/bugs/2013543
You received this bug notification because you are a member of Ubuntu Touch 
seeded packages, which is subscribed to systemd in Ubuntu.

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to