[ceph-users] Upgrade Ceph 16.2.10 to 17.2.x for Openstack RBD storage

2022-12-04 Thread Zakhar Kirpichenko
Hi! I'm planning to upgrade our Ceph cluster from Pacific (16.2.10) to Quincy (17.2.x). The cluster is used for Openstack block storage (RBD), Openstack version is Wallaby built on Ubuntu 20.04. Is anyone using Ceph Quincy (17.2.x) with Openstack Wallaby? If you are, please let me know if you've

[ceph-users] pol min_size

2022-12-04 Thread Christopher Durham
Hello, I have an ec pool set as 6+2. I have noticed that when rebooting servers during system upgrades, I get pgs set to inactive while the osds are down. I then discovered that my min_size for the pool is set to 7, which makes sense that if I reboot two servers that host a pg with OSDs on

[ceph-users] multisite sync error

2022-12-04 Thread Ramin Najjarbashi
Hi I have two zonegroup inside a realm, each one has a zone like the one below: REALM ++ | +--+ +-+ | | | | | | | | | zonegroup 1 (Master) | | zonegroup 2 | | | | | | | | | | | | |

[ceph-users] Re: octopus rbd cluster just stopped out of nowhere (>20k slow ops)

2022-12-04 Thread Boris Behrens
@Marius: no swap at all. I rather buy more memory than use swap :) Am So., 4. Dez. 2022 um 20:10 Uhr schrieb Marius Leustean < marius.l...@gmail.com>: > Hi Boris > > Do you have swap enabled on any of the OSD hosts? That may slow down > RocksDB drastically. > > On Sun, Dec 4, 2022 at 8:59 PM

[ceph-users] Re: octopus rbd cluster just stopped out of nowhere (>20k slow ops)

2022-12-04 Thread Boris Behrens
@Alex: the issue is done for now, but I fear it might come back sometime. The cluster was running fine for months. I check if we can restart the switches easily. Host reboots should also be no problem. There is no "implicated OSD" message in the logs. All OSDs were recreated 3 months ago. (sync

[ceph-users] Re: octopus rbd cluster just stopped out of nowhere (>20k slow ops)

2022-12-04 Thread Marius Leustean
Hi Boris Do you have swap enabled on any of the OSD hosts? That may slow down RocksDB drastically. On Sun, Dec 4, 2022 at 8:59 PM Alex Gorbachev wrote: > Hi Boris, > > These waits seem to be all over the place. Usually, in the main ceph.log > you see "implicated OSD" messages - I would try to

[ceph-users] Re: octopus rbd cluster just stopped out of nowhere (>20k slow ops)

2022-12-04 Thread Alex Gorbachev
Hi Boris, These waits seem to be all over the place. Usually, in the main ceph.log you see "implicated OSD" messages - I would try to find some commonality with either a host, switch, or something like that. Can be bad ports/NICs, LACP problems, even bad cables sometimes. I try to isolate an

[ceph-users] Dilemma with PG distribution

2022-12-04 Thread Boris Behrens
Hi, I am just evaluating out cluster configuration again, because we had an very bad incident with laggy OSDs that shut down the entire cluster. We use datacenter SSDs in different sizes (2, 4, 8TB) and someone said, that I should not go beyond a specific amount of PGs on certain device classes.

[ceph-users] Re: octopus rbd cluster just stopped out of nowhere (>20k slow ops)

2022-12-04 Thread Boris Behrens
Hi Alex, I am searching for a log line that points me in the right direction. From what I've seen, I could find a specific Host, OSD, PG that was leading to this problem. But maybe I am looking at the wrong logs. I have around 150k lines that look like this:

[ceph-users] Re: Tuning CephFS on NVME for HPC / IO500

2022-12-04 Thread Janne Johansson
Den lör 3 dec. 2022 kl 22:52 skrev Sebastian : > > One thing to this discussion. > I had a lot of problems with my clusters. I spent some time debugging. > What I found and what I confirmed on AMD nodes, everything starts working > like a charm when I added to kernel param iommu=pt > Plus some

[ceph-users] Re: Tuning CephFS on NVME for HPC / IO500

2022-12-04 Thread Manuel Holtgrewe
Dear Sebastian, Thank you for this insight. It sounds like something that is easy to try. Does this relate to the Ceph cluster? My use case is cephfs only. All my clients are Intel based and strictly separated from the Ceph servers. Everything is bare metal. Most information I found on IOMMU