After IRC conversations and more testing, I think that I have a clean
reproduction of this bug, along with a root cause.

The root cause: the charm takes control of the radosgw service, and
changes the name, but doesn't remove the old nrpe check.

To reproduce:

1) juju deploy the following bundle: https://paste.ubuntu.com/p/wpVt447Vwz/
2) juju ssh into ceph-radosgw/0 and note that there is a "check_radosgw.cfg" in 
/etc/nagios/nrpe.d.
3) Trigger the config-changed hooked on the ceph-radosgw charm. You might 
change the number of ceph replicas, for example.
4) Note that there is now a "check_ceph-radosgw@<hostname>.cfg" check, in 
addition to the check_radosgw.cfg check.
5) Run both checks (cat the files to get the command). Note that the new, 
hostname based check succeeds, but the old check does not.

The original check will also fail if you run it during step 2,
suggesting that the service has been changed, but the nagios monitoring
is not updated until the config-changed hook runs.

This bug can be closed once the charm places checks in
/etc/nagios/nrpe.d that accurately reflect the running services, and
cleans up outdated checks as well.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1825843

Title:
  systemd issues with bionic-rocky causing nagios alert and can't
  restart daemon

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-radosgw/+bug/1825843/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to