forgot to mention - we are running jewel, 10.2.10

On 26/03/18 11:30, Josef Zelenka wrote:
Hi everyone, i'm currently fighting an issue in a cluster we have for a customer. It's used for a lot of small files(113m currently) that are pulled via radosgw. We have 3 nodes, 24 OSDs in total. the index etc pools are migrated to a separate root called "ssd", that root is on only ssd drives - each node has one ssd in this root. We did this because we had an issue where if a normal OSD(an HDD) crashed, the entire rgw stopped working. Today, one of the SSDs crashed and after changing the drive, putting a new one in and starting recovery, RGW halted writes. Read worked ok, but we couldn't upload any more files to it. The non-data pools all have size set to 3, so there should still be 2 healthy copies of the index data. Also, when recovery started, no recovery i/o was shown in the ceph -s output, so we checked it through df, after the ssd backfilled, ceph -s went from X degraded pgs back to OK instantly. Does anyone know how to fix these? i don't think writes should be halted during recovery.

Thanks

Josef Z

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to