Hi!
I'm planning to upgrade our Ceph cluster from Pacific (16.2.10) to Quincy
(17.2.x). The cluster is used for Openstack block storage (RBD), Openstack
version is Wallaby built on Ubuntu 20.04.
Is anyone using Ceph Quincy (17.2.x) with Openstack Wallaby? If you are,
please let me know if you've
Hello,
I have an ec pool set as 6+2. I have noticed that when rebooting servers during
system upgrades, I get pgs set to inactive while the osds are down. I then
discovered that my min_size for the pool is set to 7, which makes sense that if
I reboot two servers that host a pg with OSDs on
Hi
I have two zonegroup inside a realm, each one has a zone like the one below:
REALM
++
| +--+ +-+ |
| | | | | |
| | zonegroup 1 (Master) | | zonegroup 2 | |
| | | | | |
| | | | |
@Marius:
no swap at all. I rather buy more memory than use swap :)
Am So., 4. Dez. 2022 um 20:10 Uhr schrieb Marius Leustean <
marius.l...@gmail.com>:
> Hi Boris
>
> Do you have swap enabled on any of the OSD hosts? That may slow down
> RocksDB drastically.
>
> On Sun, Dec 4, 2022 at 8:59 PM
@Alex:
the issue is done for now, but I fear it might come back sometime. The
cluster was running fine for months.
I check if we can restart the switches easily. Host reboots should also be
no problem.
There is no "implicated OSD" message in the logs.
All OSDs were recreated 3 months ago. (sync
Hi Boris
Do you have swap enabled on any of the OSD hosts? That may slow down
RocksDB drastically.
On Sun, Dec 4, 2022 at 8:59 PM Alex Gorbachev
wrote:
> Hi Boris,
>
> These waits seem to be all over the place. Usually, in the main ceph.log
> you see "implicated OSD" messages - I would try to
Hi Boris,
These waits seem to be all over the place. Usually, in the main ceph.log
you see "implicated OSD" messages - I would try to find some commonality
with either a host, switch, or something like that. Can be bad ports/NICs,
LACP problems, even bad cables sometimes. I try to isolate an
Hi,
I am just evaluating out cluster configuration again, because we had an
very bad incident with laggy OSDs that shut down the entire cluster.
We use datacenter SSDs in different sizes (2, 4, 8TB) and someone said,
that I should not go beyond a specific amount of PGs on certain device
classes.
Hi Alex,
I am searching for a log line that points me in the right direction. From
what I've seen, I could find a specific Host, OSD, PG that was leading to
this problem.
But maybe I am looking at the wrong logs.
I have around 150k lines that look like this:
Den lör 3 dec. 2022 kl 22:52 skrev Sebastian :
>
> One thing to this discussion.
> I had a lot of problems with my clusters. I spent some time debugging.
> What I found and what I confirmed on AMD nodes, everything starts working
> like a charm when I added to kernel param iommu=pt
> Plus some
Dear Sebastian,
Thank you for this insight. It sounds like something that is easy to try.
Does this relate to the Ceph cluster?
My use case is cephfs only. All my clients are Intel based and strictly
separated from the Ceph servers. Everything is bare metal.
Most information I found on IOMMU
11 matches
Mail list logo