[ceph-users] Re: How to clear "Too many repaired reads on 1 OSDs" on pacific

2022-02-28 Thread Anthony D'Atri
I would think that such an error has a failing drive as the root cause, so investigate that first. Destroying and redeploying such an OSD should take care of itt. > On Feb 28, 2022, at 5:04 PM, Szabo, Istvan (Agoda) > wrote: > > Restart osd. > > Istvan Szabo > Senior Infrastructure

[ceph-users] Understanding RGW multi zonegroup replication topology

2022-02-28 Thread Mark Selby
I am designing a Ceph RGW multisite configuration. I have done a fair bit if reading but still am having trouble groking the utility of having multiple zonegroups within a realm. I know that all meta data is replicated between zonegroups and that replication can be setup between zones across

[ceph-users] Errors when scrub ~mdsdir and lots of num_strays

2022-02-28 Thread Arnaud M
Hello to everyone Our ceph cluster is healthy and everything seems to go well but we have a lot of num_strays ceph tell mds.0 perf dump | grep stray "num_strays": 1990574, "num_strays_delayed": 0, "num_strays_enqueuing": 0, "strays_created": 3,

[ceph-users] Re: *****SPAM***** Re: removing osd, reweight 0, backfilling done, after purge, again backfilling.

2022-02-28 Thread Marc
> > I have a clean cluster state, with the osd's that I am going to remove a > reweight of 0. And then after executing 'ceph osd purge 19', I have again > remapping+backfilling done? > > > > Is this indeed the correct procedure, or is this old? > >

[ceph-users] How to clear "Too many repaired reads on 1 OSDs" on pacific

2022-02-28 Thread Sascha Vogt
Hi all, I'd like to clear the warning of too many repaired reads. In the changelog (and on some mailing list entries) I found that in Nautilus the flag "clear_shards_repaired" was added (issued via ceph tell), but unfortunately when trying to execute it, I get a "no valid command found"

[ceph-users] mclock and backgourd best effort

2022-02-28 Thread Luis Domingues
Hello, As we are testing mClock scheduler, we have a question that did not found any answer on the documentation. The documentation says mClock has 3 types of load, client, recovery and best effort. I guess client is the client traffic, and recovery is the recovery when something goes wrong.

[ceph-users] Re: Single-site cluster - multiple RGW issue

2022-02-28 Thread Adam Olszewski
even more - "rgw: " disappears from "services: " section. pon., 28 lut 2022 o 11:29 Janne Johansson napisał(a): > Den mån 28 feb. 2022 kl 11:18 skrev Adam Olszewski < > adamolszewski...@gmail.com>: > > > > Hi Janne, > > Thanks for reply. > > It's not related to network, when one rack is down

[ceph-users] Re: Single-site cluster - multiple RGW issue

2022-02-28 Thread Janne Johansson
Den mån 28 feb. 2022 kl 11:18 skrev Adam Olszewski : > > Hi Janne, > Thanks for reply. > It's not related to network, when one rack is down (containing one RGW host), > 'ceph -s' command shows no RGW services, however systemd ceph daemons are > running on second RGW. There is no event in ceph

[ceph-users] Re: Single-site cluster - multiple RGW issue

2022-02-28 Thread Adam Olszewski
Hi Janne, Thanks for reply. It's not related to network, when one rack is down (containing one RGW host), 'ceph -s' command shows no RGW services, however systemd ceph daemons are running on second RGW. There is no event in ceph crash list. pon., 28 lut 2022 o 11:08 Janne Johansson napisał(a):

[ceph-users] Re: Single-site cluster - multiple RGW issue

2022-02-28 Thread Janne Johansson
Den mån 28 feb. 2022 kl 10:40 skrev Adam Olszewski : > Hi, > We have deployed two RGW hosts with two containers each in our single-site > cluster. > When any of these two hosts is down, second one becomes unresponsive too, > returning error 500. Are they connected in some way? They are not. You

[ceph-users] Single-site cluster - multiple RGW issue

2022-02-28 Thread Adam Olszewski
Hi, We have deployed two RGW hosts with two containers each in our single-site cluster. When any of these two hosts is down, second one becomes unresponsive too, returning error 500. Are they connected in some way? OSDs are splitted by two racks in crushmap, as long as MDS (2 instances) and MGR (3