[ceph-users] Re: Reweighting OSD while down results in undersized+degraded PGs

2020-05-20 Thread Andras Pataki
Hi Dan, Unfortunately 'ceph osd down osd.0' doesn't help - it is marked down and soon after back up, but it doesn't peer still.  I tried reweighting the OSD to half its weight, 4.0 instead of 0.0, and that results in about half the PGs staying degraded.  So this is not specific to zero weight.

[ceph-users] Re: Reweighting OSD while down results in undersized+degraded PGs

2020-05-20 Thread Andras Pataki
Hi Frank, Thanks for the explanation - I wasn't aware of this subtle point. So when some OSDs are down, one has to be very careful with changing the cluster then.  I guess one could even end up with incomplete PGs this way that ceph can't recover from in an automated fashion? Andras On 5/1

[ceph-users] Re: Reweighting OSD while down results in undersized+degraded PGs

2020-05-20 Thread Dan van der Ster
Hi Andras, To me it looks like the osd.0 is not peering when it starts with crush weight 0. I would try forcing the re-peering with `ceph osd down osd.0` when the PGs are unexpectedly degraded. (e.g start the osd when crush weight is 0, then obverve the PGs are still degraded, then force the re-p

[ceph-users] Re: Reweighting OSD while down results in undersized+degraded PGs

2020-05-19 Thread Frank Schilder
Hi Andreas, the cluster map and crush map are not the same thing. If you change the crush map while the cluster is in degraded state, you basically modify this history of cluster maps explicitly and have to live with the consequences (keeping history under crush map changes is limited to up+in

[ceph-users] Re: Reweighting OSD while down results in undersized+degraded PGs

2020-05-19 Thread Andras Pataki
Hi Frank, My understanding was that once a cluster is in a degraded state (an OSD is down), ceph stores all changed cluster maps until the cluster is healthy again exactly for the reason of finding missing objects. If there is a real disaster of some kind, and many OSDs go up and down at vari

[ceph-users] Re: Reweighting OSD while down results in undersized+degraded PGs

2020-05-19 Thread Frank Schilder
Hi Andreas, I made exactly the same observation in another scenario. I added some OSDs while other OSDs were down. This is expected. The crush map is an a-priory algorithm to compute the location of objects without contacting a central server. Hence, *any*change of a crush map while an OSD is