Hello Josh,
I simulated the osd.14 failure by the following steps:
1. hot unplug the disk
2. systemctl stop ceph-osd@14
3. ceph osd out 14
The used CRUSH rule to create the EC8+3 pool is described as below:
# ceph osd crush rule dump erasure_hdd_mhosts
{
"rule_id": 8,
Hi Igor,
Thanks for your response.This problem happens on my osds with hdd disks. I
set the bluefs_buffered_io to true just for these osds but it caused my
bucket index disks (which are ssd) to produce slow ops. I also tried to set
bluefs_buffered_io to true in bucket index osds but they filled
Am So., 25. Juli 2021 um 18:02 Uhr schrieb Dan van der Ster
:
>
> What do you have for the new global_id settings? Maybe set it to allow
> insecure global_id auth and see if that allows the mon to join?
auth_allow_insecure_global_id_reclaim is allowed as we still have
some VM's not restarted
#
What do you have for the new global_id settings? Maybe set it to allow
insecure global_id auth and see if that allows the mon to join?
> I can try to move the /var/lib/ceph/mon/ dir and recreate it!?
I'm not sure it will help. Running the mon with --debug_ms=1 might give
clues why it's stuck
Am So., 25. Juli 2021 um 17:17 Uhr schrieb Dan van der Ster
:
>
> > raise the min version to nautilus
>
> Are you referring to the min osd version or the min client version?
yes sorry was not written clearly
> I don't think the latter will help.
>
> Are you sure that mon.osd01 can reach those
> raise the min version to nautilus
Are you referring to the min osd version or the min client version?
I don't think the latter will help.
Are you sure that mon.osd01 can reach those other mons on ports 6789 and
3300?
Do you have any notable custom ceph configurations on this cluster?
.. Dan
hi Dan, hi Folks,
I started the osd01 in the foreground with debugging and basically got
this loop! maybe it can help to raise the min version to nautilus but
I'm afraid to run those commands on a cluster in the current state
mon.osd01@0(probing).auth v0 _set_mon_num_rank num 0 rank 0
With four mons total then only one can be down... mon.osd01 is down already
you're at the limit.
It's possible that whichever reason is preventing this mon from joining
will also prevent the new mon from joining.
I think you should:
1. Investigate why mon.osd01 isn't coming back into the
hi folks
I have a cluster running ceph 14.2.22 on ubuntu 18.04 and some hours
ago one of the mons stopped working and the on-call team rebooted the
node; not the mon is is not joining the ceph-cluster.
TCP ports of mons are open and reachable!
ceph health detail
HEALTH_WARN 1/3 mons down,