[ceph-users] MDS crash in interval_set: FAILED ceph_assert(p->first <= start)

2024-05-07 Thread Dejan Lesjak
Hello, We have cephfs with two active MDS. Currently rank 1 is repeatedly crashing with FAILED ceph_assert(p->first <= start) in md_log_replay thread. Is there any way to work around this and get to accesible file system or should we start with disaster recovery? It seems similar to

[ceph-users] Numa pinning best practices

2024-05-07 Thread Szabo, Istvan (Agoda)
Hi, Haven't really found a proper descripton in case of 2 socket how to pin osds to numa node, only this: https://tracker.ceph.com/projects/ceph/wiki/Tuning_for_All_Flash_Deployments#Ceph-Storage-Node-NUMA-Tuning Tuning for All Flash Deployments - Ceph - Ceph

[ceph-users] Re: Removed host in maintenance mode

2024-05-07 Thread Eugen Block
Error EINVAL: hostname is online, please remove host without --offline. This is strange, why is it online? I thought you couldn't get it to boot according to your first message. What is the current output of 'ceph orch host ls' and is it still present? Have you failed over the mgr, just

[ceph-users] Problem with take-over-existing-cluster.yml playbook

2024-05-07 Thread vladimir franciz blando
I know that only a few are using this script but just trying my luck here if someone has the same issue as mine. But first, who has successfully used this script and what version did you use? Im using this guide on my test environment (

[ceph-users] Re: Removed host in maintenance mode

2024-05-07 Thread Johan
Looking at the history I first tried ceph orch host rm hostname --offline --force and then ceph orch host rm hostname --force The second command must have removed the host (partially) because I didn't try any other commands after that. Now when I try these commands again, offline gives

[ceph-users] cephadm upgrade: heartbeat failures not considered

2024-05-07 Thread Eugen Block
Hi, we're facing an issue during upgrades (and sometimes server reboots), it appears to occur when (at leat) one of the MONs has to do a full sync. And I'm wondering if the upgrade procedure could be improved in that regard, I'll come back to that later. First, I'll try to summarize the

[ceph-users] Re: [EXTERN] Re: cache pressure?

2024-05-07 Thread Erich Weiler
I still saw client cache pressure messages, although I think it did in general help a bit. What I additionally just did (like 5 minutes ago), was reduce "mds_recall_max_caps" from 30,000 to 10,000 after looking at this post: https://www.spinics.net/lists/ceph-users/msg73188.html And will

[ceph-users] Re: [EXTERN] Re: cache pressure?

2024-05-07 Thread Dietmar Rieder
On 4/26/24 23:51, Erich Weiler wrote: As Dietmar said, VS Code may cause this. Quite funny to read, actually, because we've been dealing with this issue for over a year, and yesterday was the very first time Ceph complained about a client and we saw VS Code's remote stuff running. Coincidence.

[ceph-users] Guidance on using large RBD volumes - NTFS

2024-05-07 Thread Robert W. Eckert
Hi - in my home , I have been running cephfs for a few years, and have reasonably good performance, however since exposing cephfs via SMB has been hit and miss.So I thought I could carve out space for a RBD device to share from a windows machine My set up: CEPH 18.2.2 deployed using

[ceph-users] Re: RBD Mirroring with Journaling and Snapshot mechanism

2024-05-07 Thread Eugen Block
Hi, I'm not the biggest rbd-mirror expert. As understand it, if you use one-way mirroring you can failover to the remote site, continue to work there but there's no failover back to primary site. You would need to stop client IO on DR, demote the image and then import the remote images

[ceph-users] Re: Removed host in maintenance mode

2024-05-07 Thread Eugen Block
Hi, did you remove the host from the host list [0]? ceph orch host rm [--force] [--offline] [0] https://docs.ceph.com/en/latest/cephadm/host-management/#offline-host-removal Zitat von Johan : Hi all, In my small cluster of 6 hosts I had troubles with a host (osd:s) and was planning to

[ceph-users] Removed host in maintenance mode

2024-05-07 Thread Johan
Hi all, In my small cluster of 6 hosts I had troubles with a host (osd:s) and was planning to remove it from the cluster. Before I got to do that I needed to power down this host and therefore put it in maintenance mode. Due to some mistakes on my part I couldn't boot the host again and

[ceph-users] Re: Dashboard issue slowing to a crawl - active ceph mgr process spiking to 600%+

2024-05-07 Thread Eugen Block
Hi, it's a bit much output to scan through, I'd recommend to omit all unnecessary information before pasting. Anyway, this sticks out: 2024-05-01T15:49:26.977+ 7f85688e8700 0 [dashboard ERROR frontend.error] (https://172.20.2.30:8443/#/login): Http failure response for