Hello Daniel,
The situation is not as bad as you described. It is just
PG_BACKFILL_FULL, which means: if the backfills proceed, then one osd
will become backfillfull (i.e., over 90% by default).
This is definitely something that the balancer should be able to
resolve if it were allowed to act.
The backfilling was caused by decommissioning an old host and moving a
bunch of OSD to new machines.
Balancer has not been activated since the backfill started / OSDs were
moved around on hosts.
Busy OSD level ? Do you mean fullness? The cluster is relatively unused in
terms of business.
# ceph
Hi Daniel,
Changing pg_num when some OSD is almost full is not a good strategy (or
even dangerous).
What is causing this backfilling? loss of an OSD? balancer? other ?
What is the least busy OSD level (sort -nrk17)
Is the balancer activated? (upmap?)
Once the situation stabilizes, it becomes