Hello people :) we are facing a situation quite similar to the one described here: http://tracker.ceph.com/issues/23117
Namely: we have a Luminous cluster consisting of 16 hosts, where each host holds 12 OSDs on spinning disks and 4 OSDs on SSDs. Let's forget the SSDs for now since they're not used atm. We have a Erasure Coding pool (k=6, m=3) with 4096 PGs, residing on the spinning disks, with failure domain the host. After getting a host (and their OSDs) out for maintenance, we're trying to put the OSDs back in. While cluster starts recovering we observe > Reduced data availability: 170 pgs inactive and > 170 activating+remapped This eventually leads to slow/stucked requests and we have to get the OSDs out again. While searching around we came across the already mentioned issue on tracker [1] and we're wondering "PG overdose protection" [2] is what we're really facing now. Our cluster features: "mon_max_pg_per_osd": "200", "osd_max_pg_per_osd_hard_ratio": "2.000000", What is more, we observed that the PGs distribution among the OSDs is not uniform, eg: > ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS TYPE NAME > > -1 711.29004 - 666T 165T 500T 0 0 - root default > > -17 44.68457 - 45757G 11266G 34491G 24.62 0.99 - host > rd3-1427 > 9 hdd 3.66309 1.00000 3751G 976G 2774G 26.03 1.05 212 > osd.9 > 30 hdd 3.66309 1.00000 3751G 961G 2789G 25.64 1.03 209 > osd.30 > 46 hdd 3.66309 1.00000 3751G 902G 2848G 24.07 0.97 196 > osd.46 > 61 hdd 3.66309 1.00000 3751G 877G 2873G 23.40 0.94 190 > osd.61 > 76 hdd 3.66309 1.00000 3751G 984G 2766G 26.24 1.05 214 > osd.76 > 92 hdd 3.66309 1.00000 3751G 894G 2856G 23.84 0.96 194 > osd.92 > 107 hdd 3.66309 1.00000 3751G 881G 2869G 23.50 0.94 191 > osd.107 > 123 hdd 3.66309 1.00000 3751G 973G 2777G 25.97 1.04 212 > osd.123 > 138 hdd 3.66309 1.00000 3751G 975G 2775G 26.01 1.05 212 > osd.138 > 156 hdd 3.66309 1.00000 3751G 813G 2937G 21.69 0.87 176 > osd.156 > 172 hdd 3.66309 1.00000 3751G 1016G 2734G 27.09 1.09 221 > osd.172 > 188 hdd 3.66309 1.00000 3751G 998G 2752G 26.62 1.07 217 > osd.188 Could these OSDs, holding more than 200 PGs, contribute to the problem? Is there any way to confirm that we're hitting the "PG overdose protection"? If that's true how can restore our cluster back to normal. Apart from getting these OSDs back to work, we're concerned about the overall choices regarding the number of PGs (4096) for that (6,3) EC pool. Any help appreciated, Alex [1] http://tracker.ceph.com/issues/23117 [2] https://ceph.com/community/new-luminous-pg-overdose-protection/ _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com