[ceph-users] Re: pacific: ceph-mon services stopped after OSDs are out/down

2022-12-22 Thread Mevludin Blazevic
Hi Eugen, thanks for the information! On the test development I found out that the best approach was to fire "ceph orch daemon rm mon. --force" to remove the stopped daemon. After a few minutes, Ceph restarts the daemon again on the host in a running state. This command works also on mon

[ceph-users] Re: pacific: ceph-mon services stopped after OSDs are out/down

2022-12-14 Thread Eugen Block
There's an existing tracker issue [1] that hasn't been updated since a year. The OP reported that restarting the other MONs did resolve it, have you tried that? [1] https://tracker.ceph.com/issues/52760 Zitat von Mevludin Blazevic : Its very strange. The keyring of the ceph monitor is the

[ceph-users] Re: pacific: ceph-mon services stopped after OSDs are out/down

2022-12-13 Thread Mevludin Blazevic
Its very strange. The keyring of the ceph monitor is the same as on one of the working monitor hosts. The failed mon and the working mons also have the same selinux policies and firewalld settings. The connection is also present since, all osd deamons are up on the failed ceph monitor node.

[ceph-users] Re: pacific: ceph-mon services stopped after OSDs are out/down

2022-12-13 Thread Eugen Block
Did you check the permissions? To me it reads like the permission denied errors prevent the MONs from starting and then as a result they are removed from the monmap: ceph-8c774934-1535-11ec-973e-525400130e4f-mon-sparci-store1[786211]: debug 2022-12-13T10:24:21.599+ 7f317ba4d700 -1

[ceph-users] Re: pacific: ceph-mon services stopped after OSDs are out/down

2022-12-13 Thread Mevludin Blazevic
The keyring is the same, but I found the following log lines: Dec 13 12:22:18 sparci-store1 ceph-8c774934-1535-11ec-973e-525400130e4f-mon-sparci-store1[813780]: debug 2022-12-13T11:22:18.016+ 7f789e7f3700  0 mon.sparci-store1@1(probing) e18  removed from monmap, suicide. Dec 13 12:22:18

[ceph-users] Re: pacific: ceph-mon services stopped after OSDs are out/down

2022-12-13 Thread Eugen Block
So you get "Permission denied" errors, I'm guessing either the mon keyring is not present (or wrong) or the mon directory doesn't belong to the ceph user. Can you check ls -l /var/lib/ceph/FSID/mon.sparci-store1/ Compare the keyring file with the ones on the working mon nodes. Zitat von

[ceph-users] Re: pacific: ceph-mon services stopped after OSDs are out/down

2022-12-13 Thread Mevludin Blazevic
Hi Eugen, I assume the mon db is stored on the "OS disk". I could not find any error related lines in cephadm.log, here is what journalctl -xe tells me: Dec 13 11:24:21 sparci-store1 ceph-8c774934-1535-11ec-973e-525400130e4f-mon-sparci-store1[786211]: debug 2022-12-13T10:24:21.392+

[ceph-users] Re: pacific: ceph-mon services stopped after OSDs are out/down

2022-12-08 Thread Eugen Block
Hi, do the MONs use the same SAS interface? They store the mon db on local disk, so it might be related. But without any logs or more details it's just guessing. Regards, Eugen Zitat von Mevludin Blazevic : Hi all, I'm running Pacific with cephadm. After installation, ceph