[ceph-users] Re: Error removing snapshot schedule

2022-02-23 Thread Venky Shankar
On Thu, Feb 24, 2022 at 8:00 AM Jeremy Hansen wrote: > > Can’t figure out what I’m doing wrong. Is there another way to remove a > snapshot schedule? > > [ceph: root@cephn1 /]# ceph fs snap-schedule status / / testfs > {"fs": "testfs", "subvol": null, "path": "/", "rel_path": "/", "schedule": >

[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

2022-02-23 Thread Sebastian Mazza
Hi Igor, I let ceph rebuild the OSD.7. Then I added ``` [osd] debug bluefs = 20 debug bdev = 20 debug bluestore = 20 ``` to the ceph.conf of all 3 nodes and shut down all 3 nodes without writing anything to the pools on the HDDs (the Debian VM was not even running). Immed

[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

2022-02-23 Thread Sebastian Mazza
Hi Alexander, thank you for your suggestion! All my Nodes have ECC memory. However, I have now checked that it was recognized correctly on every system (dmesg | grep EDAC). Furthermore I checkt if an error occurred by using `edac-util` and also by searching in the logs of the mainboard BMCs. Ev

[ceph-users] CephFS snaptrim bug?

2022-02-23 Thread Linkriver Technology
Hello, I have upgraded our Ceph cluster from Nautilus to Octopus (15.2.15) over the weekend. The upgrade went well as far as I can tell. Earlier today, noticing that our CephFS data pool was approaching capacity, I removed some old CephFS snapshots (taken weekly at the root of the filesystem), ke

[ceph-users] Re: OSD SLOW_OPS is filling MONs disk space

2022-02-23 Thread Eugen Block
That is indeed unexpected, but good for you. ;-) Is the rest of the cluster healthy now? Zitat von Gaël THEROND : So! Here is really mysterious resolution. The issue vanished at the moment I requested the osd about its slow_ops history. I didn’t had time to do anything except to look for th

[ceph-users] Re: OSD SLOW_OPS is filling MONs disk space

2022-02-23 Thread Gaël THEROND
So! Here is really mysterious resolution. The issue vanished at the moment I requested the osd about its slow_ops history. I didn’t had time to do anything except to look for the osd ops history that was actually empty :-) I’ll keep all your suggestions if it ever came back :-) Thanks a lot! L

[ceph-users] Re: MDS crash due to seemingly unrecoverable metadata error

2022-02-23 Thread Xiubo Li
Have you tried to backup and then remove the 'mds%d_openfiles.%x' object to see could you start the MDS ? Thanks. On 2/23/22 7:07 PM, Wolfgang Mair wrote: Update: I managed to clear the inode errors by deleting the parent directory entry from the metadata pool. However the MDS still refuses

[ceph-users] Re: Unclear on metadata config for new Pacific cluster

2022-02-23 Thread Adam Huffman
On Wed, 23 Feb 2022 at 11:25, Eugen Block wrote: > Hi, > > if you want to have DB and WAL on the same device, just don't specify > WAL in your drivegroup. It will be automatically created on the DB > device, too. In your case the rotational flag should be enough to > distinguish between data and

[ceph-users] Re: OSD SLOW_OPS is filling MONs disk space

2022-02-23 Thread Gaël THEROND
Thanks a lot Eugene, I dumbly forgot about the rbd block prefix! I’ll try that this afternoon and told you how it went. Le mer. 23 févr. 2022 à 11:41, Eugen Block a écrit : > Hi, > > > How can I identify which operation this OSD is trying to achieve as > > osd_op() is a bit large ^^ ? > > I wou

[ceph-users] Re: Unclear on metadata config for new Pacific cluster

2022-02-23 Thread Eugen Block
Hi, if you want to have DB and WAL on the same device, just don't specify WAL in your drivegroup. It will be automatically created on the DB device, too. In your case the rotational flag should be enough to distinguish between data and DB. based on the suggestion in the docs that this wo

[ceph-users] Re: OSD SLOW_OPS is filling MONs disk space

2022-02-23 Thread Eugen Block
Hi, How can I identify which operation this OSD is trying to achieve as osd_op() is a bit large ^^ ? I would start by querying the OSD for historic_slow_ops: ceph daemon osd. dump_historic_slow_ops to see which operation it is. How can I identify the related images to this data chunk? You

[ceph-users] OSD SLOW_OPS is filling MONs disk space

2022-02-23 Thread Gaël THEROND
Hi everyone, I'm having a really nasty issue since around two days where our cluster report a bunch of SLOW_OPS on one of our OSD as: https://paste.openstack.org/show/b3DkgnJDVx05vL5o4OmY/ Here is the cluster specification: * Used to store Openstack related data (VMs/Snaphots/Volumes/Swift).

[ceph-users] MGR data on md RAID 1 or not

2022-02-23 Thread Roel van Meer
Hi list! We've got a Ceph cluster where the OS of the Ceph nodes lives on a set of SSD disks in mdadm RAID 1. We were wondering if there are any (performance) benefits of moving the MGR data away from this RAID 1 and onto a dedicated non-RAID SSD partition. The drawback would be reduced prot