[ceph-users] Re: Error EINVAL: check-host failed - Failed to add host

2024-06-04 Thread isnraju26
Thanks for the reply @Eugen Block Yes, there is some thing else is wrong in my server, but no clue why it's failing and what is the cause of boortstrap failure. I was able to bootstrap with ed keys in other server. ___ ceph-users mailing list -- ceph-

[ceph-users] Re: Help needed please ! Filesystem became read-only !

2024-06-04 Thread nbarbier
First, thanks Xiubo for your feedback ! To go further on the points raised by Sake: - How does this happen ? -> There were no preliminary signs before the incident - Is this avoidable? -> Good question, I'd also like to know how! - How to fix the issue ? -> So far, no fix nor workaround from w

[ceph-users] Re: Adding new OSDs - also adding PGs?

2024-06-04 Thread Wesley Dillingham
It depends on the cluster. In general I would say if your PG count is already good in terms of PG-per-OSD (say between 100 and 200 each) add capacity and then re-evaluate your PG count after. If you have a lot of time before the gear will be racked and could undergo some PG splits before the new g

[ceph-users] Re: reef 18.2.3 QE validation status

2024-06-04 Thread Laura Flores
Rados results were approved, and we successfully upgraded the gibba cluster. Now waiting on @Dan Mick to upgrade the LRC. On Thu, May 30, 2024 at 8:32 PM Yuri Weinstein wrote: > I reran rados on the fix https://github.com/ceph/ceph/pull/57794/commits > and seeking approvals from Radek and Laur

[ceph-users] Adding new OSDs - also adding PGs?

2024-06-04 Thread Erich Weiler
Hi All, I'm going to be adding a bunch of OSDs to our cephfs cluster shortly (increasing the total size by 50%). We're on reef, and will be deploying using the cephadm method, and the OSDs are exactly the same size and disk type as the current ones. So, after adding the new OSDs, my underst

[ceph-users] Re: tuning for backup target cluster

2024-06-04 Thread Lukasz Borek
> > You could check if your devices support NVMe namespaces and create more > than one namespace on the device. Wow, tricky. Will give it a try. Thanks! Łukasz Borek luk...@borek.org.pl On Tue, 4 Jun 2024 at 16:26, Robert Sander wrote: > Hi, > > On 6/4/24 16:15, Anthony D'Atri wrote: >

[ceph-users] Excessively Chatty Daemons RHCS v5

2024-06-04 Thread Joshua Arulsamy
Hi, I recently upgraded my RHCS cluster from v4 to v5 and moved to containerized daemons (podman) along the way. I noticed that there are a huge number of logs going to journald on each of my hosts. I am unsure why there are so many. I tried changing the logging level at runtime with commands lik

[ceph-users] Setting hostnames for zonegroups via cephadm / rgw mgr module?

2024-06-04 Thread Matthew Vernon
Hi, I'm using reef (18.2.2); the docs talk about setting up a multi-site setup with a spec file e.g. rgw_realm: apus rgw_zonegroup: apus_zg rgw_zone: eqiad placement: label: "rgw" but I don't think it's possible to configure the "hostnames" parameter of the zonegroup (and thus control what

[ceph-users] Re: tuning for backup target cluster

2024-06-04 Thread Robert Sander
Hi, On 6/4/24 16:15, Anthony D'Atri wrote: I've wondered for years what the practical differences are between using a namespace and a conventional partition. Namespaces show up as separate block devices in the kernel. The orchestrator will not touch any devices that contain a partition tab

[ceph-users] Re: tuning for backup target cluster

2024-06-04 Thread Anthony D'Atri
Or partition, or use LVM. I've wondered for years what the practical differences are between using a namespace and a conventional partition. > On Jun 4, 2024, at 07:59, Robert Sander wrote: > > On 6/4/24 12:47, Lukasz Borek wrote: > >> Using cephadm, is it possible to cut part of the NVME dr

[ceph-users] problem with mgr prometheus module

2024-06-04 Thread Dario Graña
Hi all! I'm running ceph quincy 17.2.7 in a cluster. On monday I updated the OS to AlmaLinux 9.3 to 9.4, since then grafana shows "No Data" message in all ceph related fields but, for example, the nodes information is still fine (Host Detail Dashboard). I have redeployed the mgr service with cepha

[ceph-users] Re: Update OS with clean install

2024-06-04 Thread Sake Ceph
Hi Robert, I tried, but that doesn't work :( Using exit maintenance mode results in the error: "missing 2 required positional arguments: 'hostname' and 'addr'" But running the command a second time, it looks like it works, but then I get errors with starting the containers. The start up fail

[ceph-users] Re: RBD Mirror - Failed to unlink peer

2024-06-04 Thread Eugen Block
Hi, I don't have much to contribute, but according to the source code [1] this seems to be a non-fatal message: void CreatePrimaryRequest::handle_unlink_peer(int r) { CephContext *cct = m_image_ctx->cct; ldout(cct, 15) << "r=" << r << dendl; if (r < 0) { lderr(cct) << "failed to un

[ceph-users] Re: MDS crashes to damaged metadata

2024-06-04 Thread Stolte, Felix
Hi Patrick, it has been a year now and we did not have a single crash since upgrading to 16.2.13. We still have the 19 corrupted files which are reported by 'damage ls‘. Is it now possible to delete the corrupted files without taking the filesystem offline? Am 22.05.2023 um 20:23 schrieb Patri

[ceph-users] Re: Update OS with clean install

2024-06-04 Thread Robert Sander
Hi, On 6/4/24 14:35, Sake Ceph wrote: * Store host labels (we use labels to deploy the services) * Fail-over MDS and MGR services if running on the host * Remove host from cluster * Add host to cluster again with correct labels AFAIK the steps above are not necessary. It should be sufficient

[ceph-users] Update OS with clean install

2024-06-04 Thread Sake Ceph
Hi all I'm working on a way to automate the OS upgrade of our hosts. This happens with a complete reinstall of the OS. What is the correct way to do this? At the moment I'm using the following: * Store host labels (we use labels to deploy the services) * Fail-over MDS and MGR services if running

[ceph-users] Re: tuning for backup target cluster

2024-06-04 Thread Robert Sander
On 6/4/24 12:47, Lukasz Borek wrote: Using cephadm, is it possible to cut part of the NVME drive for OSD and leave rest space for RocksDB/WALL? Not out of the box. You could check if your devices support NVMe namespaces and create more than one namespace on the device. The kernel then sees m

[ceph-users] Re: Help needed please ! Filesystem became read-only !

2024-06-04 Thread Sake Ceph
Hi Xiubo Thank you for the explanation! This won't be a issue for us, but made me think twice :) Kind regards, Sake > Op 04-06-2024 12:30 CEST schreef Xiubo Li : > > > On 6/4/24 15:20, Sake Ceph wrote: > > Hi, > > > > A little break into this thread, but I have some questions: > > * How d

[ceph-users] Re: tuning for backup target cluster

2024-06-04 Thread Lukasz Borek
> > I have certainly seen cases where the OMAPS have not stayed within the > RocksDB/WAL NVME space and have been going down to disk. How to monitor OMAPS size and if it does not get out of NVME? The OP's number suggest IIRC like 120GB-ish for WAL+DB, though depending on > workload spillover coul

[ceph-users] Re: Error EINVAL: check-host failed - Failed to add host

2024-06-04 Thread Eugen Block
Hi, I think there's something else wrong with your setup, I could bootstrap a cluster without an issue with ed keys: ceph:~ # ssh-keygen -t ed25519 Generating public/private ed25519 key pair. ceph:~ # cephadm --image quay.io/ceph/ceph:v18.2.2 bootstrap --mon-ip [IP] [some more options] --s

[ceph-users] Re: Help needed please ! Filesystem became read-only !

2024-06-04 Thread Xiubo Li
On 6/4/24 15:20, Sake Ceph wrote: Hi, A little break into this thread, but I have some questions: * How does this happen, that the filesystem gets into readonly modus The detail explanation you can refer to the ceph PR: https://github.com/ceph/ceph/pull/55421. * Is this avoidable? * How-

[ceph-users] Testing CEPH scrubbing / self-healing capabilities

2024-06-04 Thread Petr Bena
Hello, I wanted to try out (lab ceph setup) what exactly is going to happen when parts of data on OSD disk gets corrupted. I created a simple test where I was going through the block device data until I found something that resembled user data (using dd and hexdump) (/dev/sdd is a block devic

[ceph-users] Re: stretched cluster new pool and second pool with nvme

2024-06-04 Thread Eugen Block
How exactly does your crush rule look right now? I assume it's supposed to distribute data across two sites, and since one site is missing, the PGs stay in degraded state until the site comes back up. You would need to either change the crush rule or assign a different one to that pool whic

[ceph-users] Re: Missing ceph data

2024-06-04 Thread Eugen Block
Hi, if you can verify which data has been removed, and that client is still connected, you might find out who was responsible for that. Do you know which files in which directories are missing? Does that maybe already reveal one or several users/clients? You can query the mds daemons and insp

[ceph-users] Re: Help needed please ! Filesystem became read-only !

2024-06-04 Thread Sake Ceph
Hi, A little break into this thread, but I have some questions: * How does this happen, that the filesystem gets into readonly modus * Is this avoidable? * How-to fix the issue, because I didn't see a workaround in the mentioned tracker (or I missed it) * With this bug around, should you use c