After IRC conversations and more testing, I think that I have a clean reproduction of this bug, along with a root cause.
The root cause: the charm takes control of the radosgw service, and changes the name, but doesn't remove the old nrpe check. To reproduce: 1) juju deploy the following bundle: https://paste.ubuntu.com/p/wpVt447Vwz/ 2) juju ssh into ceph-radosgw/0 and note that there is a "check_radosgw.cfg" in /etc/nagios/nrpe.d. 3) Trigger the config-changed hooked on the ceph-radosgw charm. You might change the number of ceph replicas, for example. 4) Note that there is now a "check_ceph-radosgw@<hostname>.cfg" check, in addition to the check_radosgw.cfg check. 5) Run both checks (cat the files to get the command). Note that the new, hostname based check succeeds, but the old check does not. The original check will also fail if you run it during step 2, suggesting that the service has been changed, but the nagios monitoring is not updated until the config-changed hook runs. This bug can be closed once the charm places checks in /etc/nagios/nrpe.d that accurately reflect the running services, and cleans up outdated checks as well. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1825843 Title: systemd issues with bionic-rocky causing nagios alert and can't restart daemon To manage notifications about this bug go to: https://bugs.launchpad.net/charm-ceph-radosgw/+bug/1825843/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs