[ceph-users] Re: ceph recipe for nfs exports

2024-04-24 Thread Ceph . io
Wow, you made it farther than I did. I got it installed, added hosts, then NOTHING. It showed there were physical disks on the hosts but wouldn't create the OSDs. Command was accepted, but NOTHING happened. No output, no error, no NOTHING. I fought with it for over a week and finally gave

[ceph-users] Remove an OSD with hardware issue caused rgw 503

2024-04-24 Thread Mary Zhang
Hi, We recently removed an osd from our Cepth cluster. Its underlying disk has a hardware issue. We use command: ceph orch osd rm osd_id --zap During the process, sometimes ceph cluster enters warning state with slow ops on this osd. Our rgw also failed to respond to requests and returned 503.

[ceph-users] Re: Recoveries without any misplaced objects?

2024-04-24 Thread David Orman
It is RGW, but the index is on a different pool. Not seeing any key/s being reported in recovery. We've definitely had OSDs flap multiple times. David On Wed, Apr 24, 2024, at 16:48, Anthony D'Atri wrote: > Do you see *keys* aka omap traffic? Especially if you have RGW set up? > >> On Apr 24,

[ceph-users] Re: Recoveries without any misplaced objects?

2024-04-24 Thread Anthony D'Atri
Do you see *keys* aka omap traffic? Especially if you have RGW set up? > On Apr 24, 2024, at 15:37, David Orman wrote: > > Did you ever figure out what was happening here? > > David > > On Mon, May 29, 2023, at 07:16, Hector Martin wrote: >> On 29/05/2023 20.55, Anthony D'Atri wrote: >>>

[ceph-users] Re: [EXTERN] cache pressure?

2024-04-24 Thread Dietmar Rieder
Hi Erich, in our case the "client failing to respond to cache pressure" situation is/was often caused by users how have vscode connecting via ssh to our HPC head node. vscode makes heavy use of file watchers and we have seen users with > 400k watchers. All these watched files must be held in

[ceph-users] Re: Recoveries without any misplaced objects?

2024-04-24 Thread David Orman
Did you ever figure out what was happening here? David On Mon, May 29, 2023, at 07:16, Hector Martin wrote: > On 29/05/2023 20.55, Anthony D'Atri wrote: >> Check the uptime for the OSDs in question > > I restarted all my OSDs within the past 10 days or so. Maybe OSD > restarts are somehow

[ceph-users] Slow/blocked reads and writes

2024-04-24 Thread Fábio Sato
Hello all, I am trying to troubleshoot a ceph cluster version 18.2.2 having users reporting slow and blocked reads and writes. When running "ceph status" I am seeing many warnings about its health state: cluster: id: cc881230-e0dd-11ee-aa9e-37c4e4e5e14b health: HEALTH_WARN

[ceph-users] Re: Orchestrator not automating services / OSD issue

2024-04-24 Thread Michael Baer
Thanks Frédéric, Going through your steps helped me narrow down the issue. Oddly, it looks to be a network issue with the new host. Most things connects okay (ssh, ping), but when the data stream gets too big, the connections just hang. And it seems to be host specific as the other storage hosts

[ceph-users] Re: ceph recipe for nfs exports

2024-04-24 Thread Adam King
> > - Although I can mount the export I can't write on it > > What error are you getting trying to do the write? The way you set things up doesn't look to different than one of our integration tests for ingress over nfs (

[ceph-users] Re: Reconstructing an OSD server when the boot OS is corrupted

2024-04-24 Thread Eugen Block
In addition to Nico's response, three years ago I wrote a blog post [1] about that topic, maybe that can help as well. It might be a bit outdated, what it definitely doesn't contain is this command from the docs [2] once the server has been re-added to the host list: ceph cephadm osd

[ceph-users] Re: Reconstructing an OSD server when the boot OS is corrupted

2024-04-24 Thread Bailey Allison
Hey Peter, A simple ceph-volume lvm activate should get all of the OSDs back up and running once you install the proper packages/restore the ceph config file/etc., If the node was also a mon/mgr you can simply re-add those services. Regards, Bailey > -Original Message- > From: Peter

[ceph-users] Re: Reconstructing an OSD server when the boot OS is corrupted

2024-04-24 Thread Nico Schottelius
Hey Peter, the /var/lib/ceph directories mainly contain "meta data" that, depending on the ceph version and osd setup, can even be residing on tmpfs by default. Even if the data was on-disk, they are easy to recreate:

[ceph-users] Reconstructing an OSD server when the boot OS is corrupted

2024-04-24 Thread Peter van Heusden
Dear Ceph Community We have 5 OSD servers running Ceph v15.2.17. The host operating system is Ubuntu 20.04. One of the servers has suffered corruption to its boot operating system. Using a system rescue disk it is possible to mount the root filesystem but it is not possible to boot the operating

[ceph-users] Re: ceph-users Digest, Vol 118, Issue 85

2024-04-24 Thread duluxoz
Hi Eugen, Thank you for a viable solution to our underlying issue - I'll attempt to implement it shortly.  :-) However, with all the respect in world, I believe you are incorrect when you say the doco is correct (but I will be more than happy to be proven wrong).  :-) The relevant text

[ceph-users] Re: Latest Doco Out Of Date?

2024-04-24 Thread Eugen Block
Hi, I fully agree that there should be a smoother way to update client caps. But regarding the misleading terms, the docs do mention: This is because the command fs authorize becomes ambiguous So they are aware of the current state, but I don't know if there's any work in progress to

[ceph-users] ceph recipe for nfs exports

2024-04-24 Thread Roberto Maggi @ Debian
Hi you all, I'm almost new to ceph and I'm understanding, day by day, why the official support is so expansive :) I setting up a ceph nfs network cluster whose recipe can be found here below. ### --> cluster creation cephadm bootstrap --mon-ip 10.20.20.81

[ceph-users] Re: Latest Doco Out Of Date?

2024-04-24 Thread Frank Schilder
Hi Eugen, I would ask for a slight change here: > If a client already has a capability for file-system name a and path > dir1, running fs authorize again for FS name a but path dir2, > instead of modifying the capabilities client already holds, a new > cap for dir2 will be granted The

[ceph-users] Re: stretched cluster new pool and second pool with nvme

2024-04-24 Thread Eugen Block
Oh, I see. Unfortunately, I don't have a cluster in stretch mode so I can't really test that. Thanks for pointing to the tracker. Zitat von Stefan Kooman : On 23-04-2024 14:40, Eugen Block wrote: Hi, whats the right way to add another pool? create pool with 4/2 and use the rule for the

[ceph-users] Re: Latest Doco Out Of Date?

2024-04-24 Thread Eugen Block
Hi, I believe the docs [2] are okay, running 'ceph fs authorize' will overwrite the existing caps, it will not add more caps to the client: Capabilities can be modified by running fs authorize only in the case when read/write permissions must be changed. If a client already has a

[ceph-users] Re: Orchestrator not automating services / OSD issue

2024-04-24 Thread Frédéric Nass
Hello Michael, You can try this: 1/ check that the host shows up on ceph orch ls with the right label 'osds' 2/ check that the host is OK with ceph cephadm check-host . It should look like: (None) ok podman (/usr/bin/podman) version 4.6.1 is present systemctl is present lvcreate is present