"340 osds: 101 up, 112 in" This is going to be your culprit. Your CRUSH
map is in a really weird state. How many OSDs do you have in this
cluster? When OSDs go down, secondary OSDs take over for it, but when OSDs
get marked out, the cluster re-balances to distribute the data according to
how
Hello,
Env;- Bluestore EC 4+1 v11.2.0 RHEL7.3 16383 PG
We did our resiliency testing and found OSD's keeps on flapping and
cluster went to error state.
What we did:-
1. we have 5 node cluster
2. poweroff/stop ceph.target on last node and waited everything seems to
reach back to normal.
3.