[ceph-users] bunch of " received unsolicited reservation grant from osd" messages in log

2021-10-29 Thread Alexander Y. Fomichev
Hello. After upgrading to 'pacific' I found log spammed by messages like this: ... active+clean] scrubber pg(46.7aas0) handle_scrub_reserve_grant: received unsolicited reservation grant from osd 138(1) (0x560e77c51600) If I understand it correctly this is exactly what it looks, and this is not

[ceph-users] Re: Doing SAML2 Auth With Containerized mgrs

2021-10-29 Thread Edward R Huyer
Ok, I figured out how to feed the containerized SAML module the x509 certificate. The solution is dumb and hacky, but it appears to work. I’m putting what I did and why it worked here in case others come looking. The cert and key need to be put somewhere on the host filesystem that is also

[ceph-users] Progess on the support of RDMA over RoCE

2021-10-29 Thread huxia...@horebdata.cn
Dear Cephers, As all we know, Ceph performance based on NVMe SSD is quite sensitive to network latency. In this sense, RDMA over RoCE network would be very much interesting and desirable. Could someone elaborate on the progress made so far on Ceph to support RDMA over RoCE (such as Mellanox

[ceph-users] 回复: Re: Cluster Health error's status

2021-10-29 Thread 胡 玮文
Hi Michel, This “Structure needs cleaning” seems to mean that your file system is not in order, you should try “fsck”. Weiwen Hu 发件人: Michel Niyoyita 发送时间: 2021年10月29日 20:10 收件人: Etienne Menguy 抄送: ceph-users

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Michel Niyoyita
The OSDs are up and in , I have the problem on PGs as you see below root@ceph-mon1:~# ceph -s cluster: id: 43f5d6b4-74b0-4281-92ab-940829d3ee5e health: HEALTH_ERR 1/3 mons down, quorum ceph-mon1,ceph-mon3 14/32863 objects unfound (0.043%) Possible

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Etienne Menguy
Could your hardware be faulty? You are trying to deploy the faulty monitor? Or a whole new cluster? If you are trying to fix your cluster, you should focus on OSD. A cluster can run without big troubles with 2 monitors for few days (if not years…). - Etienne Menguy etienne.men...@croit.io

[ceph-users] Re: [IMPORTANT NOTICE] Potential data corruption in Pacific

2021-10-29 Thread Igor Fedotov
Hi Tobias, thanks a lot for your input, certain improvment on the critical issue notification process is needed indeed. We have an active discussion in the dev community (CLT group specifically) on how to better arrange such notifications. Perhaps it would be good to have a wider audience

[ceph-users] Re: slow operation observed for _collection_list

2021-10-29 Thread Igor Fedotov
Please manually compact the DB using ceph-kvstore-tool for all the affected OSDs (or preferable every OSD in the cluster). Highly likely you're facing RocksDB performance degradation caused by prior bulk data removal. Setting bluefs_buffered_io to true (if not yet set) might be helpful as

[ceph-users] Re: [IMPORTANT NOTICE] Potential data corruption in Pacific

2021-10-29 Thread Radoslav Milanov
Not everyone is subscribed to low traffic MLs. Something like should be posted on all lists I think. On 29.10.2021 г. 05:43 ч., Daniel Poelzleithner wrote: On 29/10/2021 11:23, Tobias Fischer wrote: I would propose to either create a separate Mailing list for these kind of Information from

[ceph-users] Re: s3cmd does not show multiparts in nautilus RGW on specific bucket (--debug shows loop)

2021-10-29 Thread Boris Behrens
Hi guys, we just updated the cluster to latest octopus, but we still can not list multipart uploads if there are more than 2k multiparts. Is there any way to show the multiparts and maybe cancel them? Am Mo., 25. Okt. 2021 um 16:23 Uhr schrieb Boris Behrens : > Hi Casey, > > thanks a lot for

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Michel Niyoyita
Hello team Below is the error , I am getting once I try to redeploy the same cluster TASK [ceph-mon : recursively fix ownership of monitor directory]

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Etienne Menguy
Have you tried to restart one of the OSD that seems to block PG recover? I don’t think increasing PG can help. - Etienne Menguy etienne.men...@croit.io > On 29 Oct 2021, at 11:53, Michel Niyoyita wrote: > > Hello Eugen > > The failure_domain is host level and crush rule is replicated_rule

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Michel Niyoyita
Hello Eugen The failure_domain is host level and crush rule is replicated_rule in troubleshooting process I changed for pool 5 its PG from 32 to 128 to see if there can be some changes. and it has the default replica (3) Thanks for your continous help On Fri, Oct 29, 2021 at 11:44 AM Etienne

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Etienne Menguy
> Is a way there you can enforce mon to rejoin a quorum ? I tried to restart it > but nothing changed. I guess it is the cause If I am not mistaken. No, but with quorum_status you can check monitor status and if it’s trying to join quorum. You may have to use daemon socket interface (asok

[ceph-users] Re: [IMPORTANT NOTICE] Potential data corruption in Pacific

2021-10-29 Thread Daniel Poelzleithner
On 29/10/2021 11:23, Tobias Fischer wrote: > I would propose to either create a separate Mailing list for these kind > of Information from the Ceph Dev Community or use a Mailing list where > not that much is happening, e.g. ceph-announce> > What do you think? I like that, low traffic ML are

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Michel Niyoyita
Dear Etienne Is a way there you can enforce mon to rejoin a quorum ? I tried to restart it but nothing changed. I guess it is the cause If I am not mistaken. below is pg querry output root@ceph-mon2:~# ceph pg 5.10 query { "snap_trimq": "[]", "snap_trimq_len": 0, "state":

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Eugen Block
Also what does the crush rule look like for pool 5 and what is the failure-domain? Zitat von Etienne Menguy : With “ceph pg x.y query” you can check why it’s complaining. x.y for pg id, like 5.77 It would also be interesting to check why mon fails to rejoin quorum, it may give you hints

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Etienne Menguy
With “ceph pg x.y query” you can check why it’s complaining. x.y for pg id, like 5.77 It would also be interesting to check why mon fails to rejoin quorum, it may give you hints at your OSD issues. - Etienne Menguy etienne.men...@croit.io > On 29 Oct 2021, at 10:34, Michel Niyoyita

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Michel Niyoyita
Hello Etienne This is the ceph -s output root@ceph-mon1:~# ceph -s cluster: id: 43f5d6b4-74b0-4281-92ab-940829d3ee5e health: HEALTH_ERR 1/3 mons down, quorum ceph-mon1,ceph-mon3 14/47681 objects unfound (0.029%) 1 scrub errors

[ceph-users] Re: Cluster Health error's status

2021-10-29 Thread Etienne Menguy
Hi, Please share “ceph -s” output. - Etienne Menguy etienne.men...@croit.io > On 29 Oct 2021, at 10:03, Michel Niyoyita wrote: > > Hello team > > I am running a ceph cluster with 3 monitors and 4 OSDs nodes running 3osd > each , I deployed my ceph cluster using ansible and ubuntu 20.04 as

[ceph-users] Cluster Health error's status

2021-10-29 Thread Michel Niyoyita
Hello team I am running a ceph cluster with 3 monitors and 4 OSDs nodes running 3osd each , I deployed my ceph cluster using ansible and ubuntu 20.04 as OS , the ceph version is Octopus. yesterday , My server which hosts OSDs nodes restarted because of power issue and to comeback on its status

[ceph-users] Re: 2 OSDs Near Full, Others Under 50%

2021-10-29 Thread Janne Johansson
Den tors 28 okt. 2021 kl 22:25 skrev Dave Hall : > Hello, > I have a Nautilus 14.2.21 cluster with 48 x 12TB OSDs across 6 nodes, with > 3 new nodes and 24 more OSDs ready to come online. The bulk of my pools > are EC 8+2 with a failure domain of OSD. > Until yesterday one of the original 48 OSDs