Yes, the empty DB told me that at this point I had no other choice
than recreate the entire mon service.
* remove broken mon
ceph mon remove $(hostname -s)
* mon preparation done
rm -rf /var/lib/ceph/mon/ceph-$(hostname -s)
mkdir /var/lib/ceph/mon/ceph-$(hostname -s)
ceph auth get mon. -o
Your log ends with
> 2021-07-25 06:46:52.078 7fe065f24700 1 mon.osd01@0(leader).osd e749666
> do_prune osdmap full prune enabled
So mon.osd01 was still the leader at that time.
When did it leave the cluster?
> I also found that the rocksdb on osd01 is only 1MB in size and 345MB on the
> other
Hi Dan, Hi Folks,
this is how things started, I also found that the rocksdb on osd01 is
only 1MB in size and 345MB on the other mons!
2021-07-25 06:46:30.029 7fe061f1c700 0 log_channel(cluster) log [DBG]
: monmap e1: 3 mons at
{osd01=[v2:10.152.28.171:3300/0,v1:10.152.28.171:6789/0],osd02=[v2:10
Hi,
Do you have ceph-mon logs from when mon.osd01 first failed before the
on-call team rebooted it? They might give a clue what happened to
start this problem, which maybe is still happening now.
This looks similar but it was eventually found to be a network issue:
https://tracker.ceph.com/issues
Am So., 25. Juli 2021 um 18:02 Uhr schrieb Dan van der Ster
:
>
> What do you have for the new global_id settings? Maybe set it to allow
> insecure global_id auth and see if that allows the mon to join?
auth_allow_insecure_global_id_reclaim is allowed as we still have
some VM's not restarted
#
What do you have for the new global_id settings? Maybe set it to allow
insecure global_id auth and see if that allows the mon to join?
> I can try to move the /var/lib/ceph/mon/ dir and recreate it!?
I'm not sure it will help. Running the mon with --debug_ms=1 might give
clues why it's stuck prob
Am So., 25. Juli 2021 um 17:17 Uhr schrieb Dan van der Ster
:
>
> > raise the min version to nautilus
>
> Are you referring to the min osd version or the min client version?
yes sorry was not written clearly
> I don't think the latter will help.
>
> Are you sure that mon.osd01 can reach those oth
> raise the min version to nautilus
Are you referring to the min osd version or the min client version?
I don't think the latter will help.
Are you sure that mon.osd01 can reach those other mons on ports 6789 and
3300?
Do you have any notable custom ceph configurations on this cluster?
.. Dan
hi Dan, hi Folks,
I started the osd01 in the foreground with debugging and basically got
this loop! maybe it can help to raise the min version to nautilus but
I'm afraid to run those commands on a cluster in the current state
mon.osd01@0(probing).auth v0 _set_mon_num_rank num 0 rank 0
mon.osd01@0
With four mons total then only one can be down... mon.osd01 is down already
you're at the limit.
It's possible that whichever reason is preventing this mon from joining
will also prevent the new mon from joining.
I think you should:
1. Investigate why mon.osd01 isn't coming back into the quorum.
10 matches
Mail list logo