[ceph-users] Re: OSD read latency grows over time

2024-02-02 Thread Mark Nelson
Hi Cory, Thanks for the excellent information here!  I'm super curious how much the kv cache is using in this case.  If you happen to have a dump from the perf counters that includes the prioritycache subsystem that would be ideal.  By default, onode (meta) and rocksdb (except for onodes

[ceph-users] Re: How can I clone data from a faulty bluestore disk?

2024-02-02 Thread Eugen Block
Hi, if the OSDs are deployed as LVs (by ceph-volume) you could try to do a pvmove to a healthy disk. There was a thread here a couple of weeks ago explaining the steps. I don’t have it at hand right now, but it should be easy to find. Of course, there’s no guarantee that this will be

[ceph-users] Re: OSD read latency grows over time

2024-02-02 Thread Cory Snyder
1024 PGs on NVMe. From: Anthony D'Atri Sent: Friday, February 2, 2024 2:37 PM To: Cory Snyder Subject: Re: [ceph-users] OSD read latency grows over time   Thanks. What type of media are your index OSDs? How many PGs? > On Feb 2, 2024, at 2: 32 PM, Cory Snyder wrote: > > Yes, we changed

[ceph-users] Re: OSD read latency grows over time

2024-02-02 Thread Cory Snyder
Yes, we changed osd_memory_target to 10 GB on just our index OSDs. These OSDs have over 300 GB of lz4 compressed bucket index omap data. Here is a graph showing the latencies before/after that single change: https://pasteboard.co/IMCUWa1t3Uau.png Cory Snyder From: Anthony D'Atri Sent:

[ceph-users] Re: OSD read latency grows over time

2024-02-02 Thread Anthony D'Atri
You adjusted osd_memory_target? Higher than the default 4GB? > > > Another thing that we've found is that rocksdb can become quite slow if it > doesn't have enough memory for internal caches. As our cluster usage has > grown, we've needed to increase OSD memory in accordance with bucket

[ceph-users] Re: OSD read latency grows over time

2024-02-02 Thread Cory Snyder
We've seen issues with high index OSD latencies in multiple scenarios over the past couple of years. The issues related to rocksdb tombstones could certainly be relevant, but compact on deletion has been very effective for us in that regard. Recently, we experienced a similar issue at a higher

[ceph-users] Re: Unable to mount ceph

2024-02-02 Thread Albert Shih
Le 02/02/2024 à 16:34:17+0100, Albert Shih a écrit > Hi, > > > A little basic question. > > I created a volume with > > ceph fs volume > > then a subvolume called «erasure» I can see that with > > root@cthulhu1:/etc/ceph# ceph fs subvolume info cephfs erasure > { > "atime":

[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-02-02 Thread Chris Palmer
We have fundamental problems with the concept of cephadm and its direction of travel. But that's a different story. The nub of this problem is a design incompatibility with MGR and the PyO3 package that python-cryptography relies on. It's actually unsafe as it is, and the new package just

[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-02-02 Thread Brian Chow
Would migrating to a cephadm orchestrated docker/podman cluster be an acceptable workaround? We are running that config with reef containers on Debian 12 hosts, with a couple of debian 12 clients successfully mounting cephfs mounts, using the reef client packages directly on Debian. On Fri, Feb

[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-02-02 Thread Casey Bodley
On Fri, Feb 2, 2024 at 11:21 AM Chris Palmer wrote: > > Hi Matthew > > AFAIK the upgrade from quincy/deb11 to reef/deb12 is not possible: > > * The packaging problem you can work around, and a fix is pending > * You have to upgrade both the OS and Ceph in one step > * The MGR will not run

[ceph-users] Re: How can I clone data from a faulty bluestore disk?

2024-02-02 Thread Igor Fedotov
Hi Carl, you might want to use ceph-objectstore-tool to export PGs from faulty OSDs and import them back to healthy ones. The process could be quite tricky though. There is also pending PR (https://github.com/ceph/ceph/pull/54991) to make the tool more tolerant to disk errors. The patch

[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-02-02 Thread Chris Palmer
Hi Matthew AFAIK the upgrade from quincy/deb11 to reef/deb12 is not possible: * The packaging problem you can work around, and a fix is pending * You have to upgrade both the OS and Ceph in one step * The MGR will not run under deb12 due to the PyO3 lack of support for subinterpreters.

[ceph-users] How can I clone data from a faulty bluestore disk?

2024-02-02 Thread Carl J Taylor
Hi, I have a small cluster with some faulty disks within it and I want to clone the data from the faulty disks onto new ones. The cluster is currently down and I am unable to do things like ceph-bluestore-fsck but ceph-bluestore-tool bluefs-export does appear to be working. Any help would be

[ceph-users] Unable to mount ceph

2024-02-02 Thread Albert Shih
Hi, A little basic question. I created a volume with ceph fs volume then a subvolume called «erasure» I can see that with root@cthulhu1:/etc/ceph# ceph fs subvolume info cephfs erasure { "atime": "2024-02-02 11:02:07", "bytes_pcent": "undefined", "bytes_quota": "infinite",

[ceph-users] Re: XFS on top of RBD, overhead

2024-02-02 Thread Maged Mokhtar
On 02/02/2024 16:41, Ruben Vestergaard wrote: Hi group, Today I conducted a small experiment to test an assumption of mine, namely that Ceph incurs a substantial network overhead when doing many small files. One RBD was created, and on top of that an XFS containing 1.6 M files, each with

[ceph-users] Re: XFS on top of RBD, overhead

2024-02-02 Thread Ruben Vestergaard
On Fri, Feb 02 2024 at 07:51:36 -0700, Josh Baergen wrote: On Fri, Feb 2, 2024 at 7:44 AM Ruben Vestergaard wrote: Is the RBD client performing partial object reads? Is that even a thing? Yup! The rados API has both length and offset parameters for reads

[ceph-users] Re: XFS on top of RBD, overhead

2024-02-02 Thread Josh Baergen
On Fri, Feb 2, 2024 at 7:44 AM Ruben Vestergaard wrote: > Is the RBD client performing partial object reads? Is that even a thing? Yup! The rados API has both length and offset parameters for reads (https://docs.ceph.com/en/latest/rados/api/librados/#c.rados_aio_read) and writes

[ceph-users] XFS on top of RBD, overhead

2024-02-02 Thread Ruben Vestergaard
Hi group, Today I conducted a small experiment to test an assumption of mine, namely that Ceph incurs a substantial network overhead when doing many small files. One RBD was created, and on top of that an XFS containing 1.6 M files, each with size 10 kiB: # rbd info libvirt/bobtest

[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-02-02 Thread Matthew Darwin
Chris, Thanks for all the investigations you are doing here. We're on quincy/debian11.  Is there any working path at this point to reef/debian12?  Ideally I want to go in two steps.  Upgrade ceph first or upgrade debian first, then do the upgrade to the other one. Most of our infra is

[ceph-users] PG upmap corner cases that silently fail

2024-02-02 Thread Andras Pataki
Hi cephers, I've been looking into better balancing our clusters with upmaps lately, and ran into upmap cases that behave in a less than ideal way.  If there is any cycle in the upmaps like ceph osd pg-upmap-items a b b a or ceph osd pg-upmap-items a b b c c a the upmap validation passes,

[ceph-users] Problems adding a new host via orchestration.

2024-02-02 Thread Gary Molenkamp
Happy Friday all.  I was hoping someone could point me in the right direction or clarify any limitations that could be impacting an issue I am having. I'm struggling to add a new set of hosts to my ceph cluster using cephadm and orchestration.  When trying to add a host:     "ceph orch host

[ceph-users] Re: Understanding subvolumes

2024-02-02 Thread Robert Sander
On 01.02.24 00:20, Matthew Melendy wrote: In our department we're getting starting with Ceph 'reef', using Ceph FUSE client for our Ubuntu workstations. So far so good, except I can't quite figure out one aspect of subvolumes. AFAIK subvolumes were introduced to be used with Kubernetes and

[ceph-users] Re: OSD read latency grows over time

2024-02-02 Thread Tobias Urdin
I found the internal note I made about it, see below. When we trim thousands of OMAP keys in RocksDB this calls SingleDelete() in the RocksDBStore in Ceph, this causes tombstones in the RocksDB database. These thousands of tombstones that each needs to be iterated over

[ceph-users] Re: OSD read latency grows over time

2024-02-02 Thread Tobias Urdin
Shiming in here, just so that it’s indexed in archives. We’ve have a lot of issues with tombstones when running RGW usage logging and when we trim those the Ceph OSD hosting that usage.X object will basically kill the OSD performance due to the tombstones being so many, restarting the OSD

[ceph-users] RBD mirroring to an EC pool

2024-02-02 Thread Jan Kasprzak
Hello, Ceph users, I would like to use my secondary Ceph cluster for backing up RBD OpenNebula volumes from my primary cluster using mirroring in image+snapshot mode. Because it is for backups only, not a cold-standby, I would like to use erasure coding on the secondary side to save a

[ceph-users] Re: Ceph Dashboard failed to execute login

2024-02-02 Thread Michel Niyoyita
Thank you very much Sir , now it works. Michel On Fri, Feb 2, 2024 at 11:55 AM Eugen Block wrote: > Have you tried to enable it? > > # ceph dashboard ac-user-enable admin > > Zitat von Michel Niyoyita : > > > Hello team, > > > > I failed to login to my ceph dashboard which is running pacific

[ceph-users] Re: Ceph Dashboard failed to execute login

2024-02-02 Thread Eugen Block
Have you tried to enable it? # ceph dashboard ac-user-enable admin Zitat von Michel Niyoyita : Hello team, I failed to login to my ceph dashboard which is running pacific as version and deployed using ceph-ansible . I have set admin password using the following command : "ceph dashboard

[ceph-users] Ceph Dashboard failed to execute login

2024-02-02 Thread Michel Niyoyita
Hello team, I failed to login to my ceph dashboard which is running pacific as version and deployed using ceph-ansible . I have set admin password using the following command : "ceph dashboard ac-user-set-password admin -i ceph-dash-pass" where ceph-dash-pass possesses the real password. I am

[ceph-users] Re: Cannot recreate monitor in upgrade from pacific to quincy (leveldb -> rocksdb)

2024-02-02 Thread Eugen Block
I decided to try to bring the mon back manually after looking at the logs without any findings. It's kind of ugly but it worked. The problem with that approach is that I had to take down a second MON to inject a new monmap (which then includes the failed MON), restart it and do the same

[ceph-users] Re: Cannot recreate monitor in upgrade from pacific to quincy (leveldb -> rocksdb)

2024-02-02 Thread Mark Schouten
Hi, Cool, thanks! As for the global_id_reclaim settings: root@proxmox01:~# ceph config get mon auth_allow_insecure_global_id_reclaim false root@proxmox01:~# ceph config get mon auth_expose_insecure_global_id_reclaim true root@proxmox01:~# ceph config get mon