[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

2022-01-24 Thread Igor Fedotov
Hey Sebastian, thanks a lot for the update, please see more questions inline. Thanks, Igor On 1/22/2022 2:13 AM, Sebastian Mazza wrote: Hey Igor, thank you for your response and your suggestions. I've tried to simulate every imaginable load that the cluster might have done before the thr

[ceph-users] Ceph RGW 16.2.7 CLI changes

2022-01-24 Thread Александр Махов
I am trying to run a new Ceph cluster with Rados GW using the last software version 16.2.7, but when I set up RGW nodes I found out there are some changes in the CLI comparing with a version 16.2.4 I tested before. The next commands are missed in the 16.2.7 version: ceph dashboard set-rgw-api-use

[ceph-users] PG_SLOW_SNAP_TRIMMING and possible storage leakage on 16.2.5

2022-01-24 Thread David Prude
Hello,    We have a 5-node, 30 hdd (6 hdds/node) cluster running 16.2.5. We utilize a snapshot scheme within cephfs that results in 24 hourly snapshots, 7 daily snapshots, and 2 weekly snapshots. This has been running without overt issues for several months. As of this weekend, we started receivin

[ceph-users] Re: Ceph RGW 16.2.7 CLI changes

2022-01-24 Thread Ernesto Puerta
Hi Александр, Starting Pacific 16.2.6, cephadm now configures and manages the RGW credentials. You can also trigger that auto-configuration on an upgraded cluster with `ceph dashboard set-rgw-credentials` [docs]

[ceph-users] Using s3website with ceph orch?

2022-01-24 Thread Manuel Holtgrewe
Dear all, I'm trying to configure the s3website with a site managed by ceph-orch. I'm trying to follow [1] in spirit. I have configured two ingress.rgw services "ingress.rgw.ext" and "ingress.rgw.ext-website" and point to them via ceph-s3-ext.example.com and ceph-s3-website-ext.example.com in DNS.

[ceph-users] Re: PG_SLOW_SNAP_TRIMMING and possible storage leakage on 16.2.5

2022-01-24 Thread Dan van der Ster
Hi David, We observed the same here: https://tracker.ceph.com/issues/52026 You can poke the trimming by repeering the PGs. Also, depending on your hardware, the defaults for osd_snap_trim_sleep might be far too conservative. We use osd_snap_trim_sleep = 0.1 on our mixed hdd block / ssd block.db O

[ceph-users] Re: PG_SLOW_SNAP_TRIMMING and possible storage leakage on 16.2.5

2022-01-24 Thread David Prude
Dan,   Thank you for replying. Since I posted I did some more digging. It really seemed as if snaptrim simply wasn't being processed. The output of "ceph health detail" showed that PG 3.9b had the longest queue. I examined this PG and saw that it's primary was osd.8 so I manually restarted that da

[ceph-users] Re: PG_SLOW_SNAP_TRIMMING and possible storage leakage on 16.2.5

2022-01-24 Thread Dan van der Ster
Hi, Yes, restarting an OSD also works to re-peer and "kick" the snaptrimming process. (In the ticket we first noticed this because snap trimming restarted after an unrelated OSD crashed/restarted). Please feel free to add your experience to that ticket. > monitoring snaptrimq This is from our lo

[ceph-users] Fwd: Lots of OSDs crashlooping (DRAFT - feedback?)

2022-01-24 Thread Benjamin Staffin
I have a cluster where 46 out of 120 OSDs have begun crash looping with the same stack trace (see pasted output below). The cluster is in a very bad state with this many OSDs down, unsurprisingly. The day before this problem showed up, the k8s cluster was under extreme memory pressure and a lot o

[ceph-users] Re: Lots of OSDs crashlooping (DRAFT - feedback?)

2022-01-24 Thread Benjamin Staffin
oh jeez, sorry about the subject line - I forgot to change it after asking a coworker to review the message. This is not a draft. On Mon, Jan 24, 2022 at 6:44 PM Benjamin Staffin wrote: > I have a cluster where 46 out of 120 OSDs have begun crash looping with > the same stack trace (see pasted

[ceph-users] Re: Multipath and cephadm

2022-01-24 Thread Michal Strnad
Hi all, we have still problem to add any disk behind multipath. We tried osd-spec in yml, ceph orch daemon add osd with mpath, dm-X or sdX devices (for sdX we disaled multipath daemon and flush multipath table). Do you have any idea? ceph orch daemon add osd serverX:/dev/mapper/mpathm Runti