[ceph-users] Running trim / discard on an OSD

2023-09-03 Thread Victor Rodriguez
TL;DR Is there a way to run trim / discard on an OSD? Long story: I have a Proxmox-Ceph cluster with some OSD as storage for VMs. Discard works perfectly in this cluster. For lab and testing purposes I deploy Proxmox-Ceph clusters as Proxmox VMs in this cluster using nested virtualizacion,

[ceph-users] 3 node clusters and a corner case behavior

2023-03-03 Thread Victor Rodriguez
Hello, Before we start I'm fully aware that this kind of setup is not recommended by any means and I'm familiar with it's implications. I'm just trying to practice extreme situations, just in case... I have a test cluster with: 3 nodes with Proxmox 7.3 + Ceph Quincy 17.2.5 3 monitors + 3

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-02-15 Thread Victor Rodriguez
hat histogram from time to time as a way to measure the OSD "health"? Again, thanks everyone. On 1/30/23 18:18, Victor Rodriguez wrote: On 1/30/23 15:15, Ana Aviles wrote: Hi, Josh already suggested, but I will one more time. We had similar behaviour upgrading from Nautilu

[ceph-users] Re: No such file or directory when issuing "rbd du"

2023-02-10 Thread Victor Rodriguez
I've seen that happen when a rbd image or a snapshot is being removed and you cancel the operation, specially if they are big or storage is relatively slow. The rbd image will stay "half removed" in the pool. Check "rbd ls -p POOL" vs "rbd ls -l -p POOL" outputs: the first may have one or

[ceph-users] Throttle down rebalance with Quincy

2023-02-09 Thread Victor Rodriguez
Hello, I'm adding OSDs to a 5 node cluster using Quincy 17.2.5. The network is a bonded 2x10G link. The issue I'm having is that the rebalance operation seems to impact client I/O and running VMs do not . OSDs are big 6'4TB NVMe drives, so there will be a lot of data to move. With previous

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-30 Thread Victor Rodriguez
On 1/30/23 15:15, Ana Aviles wrote: Hi, Josh already suggested, but I will one more time. We had similar behaviour upgrading from Nautilus to Pacific. In our case compacting the OSDs did the trick. Thanks for chimming in! Unfortunately, in my case neither an online compaction (ceph tell

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-30 Thread Victor Rodriguez
lder AIT Risø Campus Bygning 109, rum S14 ________ From: Victor Rodriguez Sent: 29 January 2023 22:40:46 To: ceph-users@ceph.io Subject: [ceph-users] Re: Very slow snaptrim operations blocking client I/O Looks like this is going to take a few days. I hope to

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-29 Thread Victor Rodriguez
of snapshots.  I'm not sure what the difference will be from our case versus a single large volume with a big snapshot. On 2023-01-28 20:45, Victor Rodriguez wrote: On 1/29/23 00:50, Matt Vandermeulen wrote: I've observed a similar horror when upgrading a cluster from Luminous to Nautilus, which

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-28 Thread Victor Rodriguez
osd_snap_trim_sleep_ssd value to let the cluster perform. I don't know how long this is going to take... Maybe recreating the OSD's and dealing with the rebalance is a better option? There's something ugly going on here... I would really like to put my finger on it. On 2023-01-28 19:43, Victor Rodriguez

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-28 Thread Victor Rodriguez
nts, that is, they are not disks for QEMU/Proxmox VMs. Maybe I have something misconfigured related to this?  This cluster is at least two and half years old an never had this issue with snaptrims. Thanks in advance! On 1/27/23 17:29, Victor Rodriguez wrote: Ah yes, checked that too. Monitors and OSD's re

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-27 Thread Victor Rodriguez
and after the compact operation. I don't want to risk any more downtimes until the scheduled maintenance window I have tomorrow, so I cant run the compact now. On Fri, Jan 27, 2023 at 6:52 AM Victor Rodriguez wrote: Hello, Asking for help with an issue. Maybe someone has a clue about what's

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-27 Thread Victor Rodriguez
    13414G 0.4004 Istvan Szabo Staff Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com --- On 2023. Jan 27., at 23:30, Victor Rodriguez wrote: Email received from

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-27 Thread Victor Rodriguez
...@wesdillingham.com LinkedIn <http://www.linkedin.com/in/wesleydillingham> On Fri, Jan 27, 2023 at 8:52 AM Victor Rodriguez wrote: Hello, Asking for help with an issue. Maybe someone has a clue about what's going on. Using ceph 15.2.17 on Proxmox 7.3. A big VM had a snapshot

[ceph-users] Very slow snaptrim operations blocking client I/O

2023-01-27 Thread Victor Rodriguez
Hello, Asking for help with an issue. Maybe someone has a clue about what's going on. Using ceph 15.2.17 on Proxmox 7.3. A big VM had a snapshot and I removed it. A bit later, nearly half of the PGs of the pool entered snaptrim and snaptrim_wait state, as expected. The problem is that such

[ceph-users] Re: Recovery or recreation of a monitor rocksdb

2022-04-05 Thread Victor Rodriguez
16:48, Konstantin Shalygin wrote: Hi, The fast way to fix quorum issue is redeploy ceph-mon service k Sent from my iPhone On 1 Apr 2022, at 14:43, Victor Rodriguez wrote: Hello, Have a 3 node cluster using Proxmox + ceph version 14.2.22 (nautilus). After a power failure one of the monitors

[ceph-users] Recovery or recreation of a monitor rocksdb

2022-04-01 Thread Victor Rodriguez
Hello, Have a 3 node cluster using Proxmox + ceph version 14.2.22 (nautilus). After a power failure one of the monitors does not start. The log states some kind of problem with it's rocksdb but I can't really pinpoint the issue. The log is available at https://pastebin.com/TZrFrZ1u. How can

[ceph-users] Recreate pool device_health_metrics

2020-12-23 Thread Victor Rodriguez
Hello, TL;DR  How can I recreate the device_health_metrics pool? I'm experimenting with Ceph Octopus v15.2.8 in a 3 node cluster under Proxmox 6.3. After initializing CEPH the usual way, a "device_health_metrics" pool is created as soon as I create the first manager. That pool has just 1 PG but

[ceph-users] CephFS and Samba/CIFS permissions (xattr)

2020-04-14 Thread Victor Rodriguez
Hello, I have a CephFS running on v14.2.8 correctly. I also have a VM which runs Samba as AD controller and fileserver (Zentyal). My plan was to mount a CephFS path on that VM and make Samba share those files to a Windows network. But I cant make the shares work as Samba is asking to mount the