[ceph-users] Re: mgr's stop responding, dropping out of cluster with _check_auth_rotating

2020-12-10 Thread Janek Bevendorff
Do you have the prometheus module enabled? Turn that off, it's causing issues. I replaced it with another ceph exporter from Github and almost forgot about it. Here's the relevant issue report: https://tracker.ceph.com/issues/39264#change-179946 On 10/12/2020 16:43, Welby McRoberts wrote: H

[ceph-users] Re: mgr's stop responding, dropping out of cluster with _check_auth_rotating

2020-12-10 Thread Janek Bevendorff
FYI, this is the ceph-exporter we're using at the moment: https://github.com/digitalocean/ceph_exporter It's not as good, but it does the job mostly. Some more specific metrics are missing, but the majority is there. On 10/12/2020 19:01, Janek Bevendorff wrote: Do you have the prometheus mod

[ceph-users] Re: mgr's stop responding, dropping out of cluster with _check_auth_rotating

2020-12-10 Thread David Orman
Hi Janek, We realize this, we referenced that issue in our initial email. We do want the metrics exposed by Ceph internally, and would prefer to work towards a fix upstream. We appreciate the suggestion for a workaround, however! Again, we're happy to provide whatever information we can that woul

[ceph-users] Re: mgr's stop responding, dropping out of cluster with _check_auth_rotating

2020-12-11 Thread Wido den Hollander
On 11/12/2020 00:12, David Orman wrote: Hi Janek, We realize this, we referenced that issue in our initial email. We do want the metrics exposed by Ceph internally, and would prefer to work towards a fix upstream. We appreciate the suggestion for a workaround, however! Again, we're happy to

[ceph-users] Re: mgr's stop responding, dropping out of cluster with _check_auth_rotating

2020-12-11 Thread David Orman
No, as the number of responses we've seen in the mailing lists and on the bug report(s) have indicated it fixed the situation, we didn't proceed down that path (it seemed highly probable it would resolve things). If it's of additional value, we can disable the module temporarily to see if the probl

[ceph-users] Re: mgr's stop responding, dropping out of cluster with _check_auth_rotating

2020-12-21 Thread David Orman
We've got a PR in to fix this; we validated it resolves the issue in our larger clusters. We could use some help getting this moved forward since it seems to impact a number of users: https://github.com/ceph/ceph/pull/38677 On Fri, Dec 11, 2020 at 9:10 AM David Orman wrote: > No, as the number