[ceph-users] Re: slow pacific osd startup

2022-02-11 Thread Andrej Filipcic
On 11/02/2022 15:22, Igor Fedotov wrote: Hi Andrej, you might want to set debug_bluestore and debug_bluefs to 10 and check what's happening during the startup... Alternatively you might try to compact slow OSD's DB using ceph_kvstore_tool and check if it helps to speedup the startup... wit

[ceph-users] Take the Ceph User Survey for 2022!

2022-02-11 Thread Mike Perez
Hi everyone! Be sure to make your voice heard by taking the Ceph User Survey before March 25, 2022. This information will help guide the Ceph community’s investment in Ceph and the Ceph community's future development. https://survey.zohopublic.com/zs/tLCskv Thank you to the Ceph User Survey Work

[ceph-users] Re: osds won't start

2022-02-11 Thread Mazzystr
This problem is solved. My links are indeed swapped host0:/var/lib/ceph/osd/ceph-0 # ls -la block* lrwxrwxrwx 1 ceph ceph 23 Jan 15 15:13 block -> /dev/mapper/ceph-0block lrwxrwxrwx 1 ceph ceph 24 Jan 15 15:13 block.db -> /dev/mapper/ceph--0db lrwxrwxrwx 1 ceph ceph 25 Jan 15 15:13 block.wa

[ceph-users] Re: Not able to start MDS after upgrade to 16.2.7

2022-02-11 Thread Izzy Kulbe
Hi, thanks for the reply. since this was a secondary backup anyways recreating the FS in case everything fails was the plan anyway, it would've just been good to know how we got it to an inoperable and irrecoverable state like this by simply running orch upgrade, to avoid running into it again in

[ceph-users] Re: osds won't start

2022-02-11 Thread Mazzystr
I set debug {bdev, bluefs, bluestore, osd} = 20/20 and restarted osd.0 Logs are here -15> 2022-02-11T11:07:09.944-0800 7f93546c0080 10 bluestore(/var/lib/ceph/osd/ceph-0/block.wal) _read_bdev_label got bdev(osd_uuid 7755e0c2-b4bf-4cbe-bc9a-26042d5bdc52, size 0xba420, btime 2019-04-11T08:46:

[ceph-users] Re: Not able to start MDS after upgrade to 16.2.7

2022-02-11 Thread Gregory Farnum
On Fri, Feb 11, 2022 at 10:53 AM Izzy Kulbe wrote: > Hi, > > If the MDS host has enough spare memory, setting > > `mds_cache_memory_limit`[*] to 9GB (or more if it permits) would get > > rid of this warning. Could you check if that improves the situation? > > Normally, the MDS starts trimming its

[ceph-users] Re: Not able to start MDS after upgrade to 16.2.7

2022-02-11 Thread Izzy Kulbe
Hi, If the MDS host has enough spare memory, setting > `mds_cache_memory_limit`[*] to 9GB (or more if it permits) would get > rid of this warning. Could you check if that improves the situation? > Normally, the MDS starts trimming its cache when it overshoots the > cache limit. > That won't work.

[ceph-users] Re: Not able to start MDS after upgrade to 16.2.7

2022-02-11 Thread Venky Shankar
On Fri, Feb 11, 2022 at 9:36 PM Izzy Kulbe wrote: > > Hi, > > at the moment no clients should be connected to the MDS(since the MDS doesn't > come up) and the cluster only serves these MDS. The MDS also didn't start > properly with mds_wipe_sessions = true. > > ceph health detail with the MDS tr

[ceph-users] Re: RBD map issue

2022-02-11 Thread Lo Re Giuseppe
root@fulen-w006:~# ll client.fulen.keyring -rw-r--r-- 1 root root 69 Feb 11 15:30 client.fulen.keyring root@fulen-w006:~# ll ceph.conf -rw-r--r-- 1 root root 118 Feb 11 19:15 ceph.conf root@fulen-w006:~# rbd -c ceph.conf --id fulen --keyring client.fulen.keyring map fulen-nvme-meta/test-loreg-3 rb

[ceph-users] Re: osds won't start

2022-02-11 Thread Mazzystr
I'm suspicious of cross contamination of devices here. I was on CentOS for eons until Red Hat shenanigans pinned me to CentOS 7 and nautilus. I had very well defined udev rules that ensured dm devices were statically set and owned correctly and survived reboots. I seem to be struggling with this

[ceph-users] Re: osds won't start

2022-02-11 Thread Mazzystr
I forgot to mention I freeze the cluster with 'ceph osd set no{down,out,backfill}'. Then I zyp up all hosts and reboot them. Only when everything is backup do I unset. My client IO patterns allow me to do this since it's a worm data store with long spans of time between writes and reads. I have

[ceph-users] Re: osds won't start

2022-02-11 Thread Mazzystr
My clusters are self rolled. My start command is as follows podman run -it --privileged --pid=host --cpuset-cpus 0,1 --memory 2g --name ceph_osd0 --hostname ceph_osd0 -v /dev:/dev -v /etc/localtime:/etc/localtime:ro -v /etc/ceph:/etc/ceph/ -v /var/lib/ceph/osd/ceph-0:/var/lib/ceph/osd/ceph-0 -v /

[ceph-users] Re: Not able to start MDS after upgrade to 16.2.7

2022-02-11 Thread Izzy Kulbe
Hi, at the moment no clients should be connected to the MDS(since the MDS doesn't come up) and the cluster only serves these MDS. The MDS also didn't start properly with mds_wipe_sessions = true. ceph health detail with the MDS trying to run: HEALTH_WARN 1 failed cephadm daemon(s); 3 large omap

[ceph-users] Re: slow pacific osd startup

2022-02-11 Thread Igor Fedotov
Hi Andrej, you might want to set debug_bluestore and debug_bluefs to 10 and check what's happening during the startup... Alternatively you might try to compact slow OSD's DB using ceph_kvstore_tool and check if it helps to speedup the startup... Just in case - is bluefs_buffered_io set to

[ceph-users] Re: slow pacific osd startup

2022-02-11 Thread Andrej Filipcic
On 11/02/2022 15:05, Josh Baergen wrote: In particular, do you have bluestore_fsck_quick_fix_on_mount set to true? no, that's set to false. Andrej Josh On Fri, Feb 11, 2022 at 2:08 AM Eugen Block wrote: Hi, is there a difference in PG size on new and old OSDs or are they all similar in si

[ceph-users] Re: MDS crash when unlink file

2022-02-11 Thread Venky Shankar
Hi Arnaud, On Fri, Feb 11, 2022 at 2:42 PM Arnaud MARTEL wrote: > > Hi, > > MDSs are crashing on my production cluster when trying to unlink some files > and I need help :-). > When looking into the log files, I have identified some associated files and > I ran a scrub on the parent directory w

[ceph-users] Re: RBD map issue

2022-02-11 Thread Eugen Block
How are the permissions of the client keyring on both systems? Zitat von Lo Re Giuseppe : Hi, It's a single ceph cluster, I'm testing from 2 different client nodes. The caps are below. I think is unlikely that caps are the cause as they work from one client node, same ceph user, and not fro

[ceph-users] Re: Not able to start MDS after upgrade to 16.2.7

2022-02-11 Thread Venky Shankar
On Fri, Feb 11, 2022 at 3:40 PM Izzy Kulbe wrote: > > Hi, > > I've tried to rm the mds0.openfiles and did a `ceph config set mds > mds_oft_prefetch_dirfrags false` but still with the same result of ceph > status reporting the daemon as up and not a lot more. > > I also tried setting the cache to r

[ceph-users] Re: RBD map issue

2022-02-11 Thread Lo Re Giuseppe
Hi, It's a single ceph cluster, I'm testing from 2 different client nodes. The caps are below. I think is unlikely that caps are the cause as they work from one client node, same ceph user, and not from the other one... Cheers, Giuseppe [root@naret-monitor01 ~]# ceph auth get client.fulen exp

[ceph-users] Re: IO stall after 1 slow op

2022-02-11 Thread 黄俊艺
Hello Frank, We've observed seemingly identical issue when a `fstrim` is carried out on one of the RBD-backed iSCSI multipath device (we use ceph-iscsi to map RBD image to local multipath device which is formatted in XFS filesystem). BTW, we use Nautilus 14.2.22.     -- Origina

[ceph-users] Re: RBD map issue

2022-02-11 Thread Eugen Block
Hi, the first thing coming to mind are the user's caps. Which permissions do they have? Have you compared 'ceph auth get client.fulen' on both clusters? Please paste the output from both clusters and redact sensitive information. Zitat von Lo Re Giuseppe : Hi all, This is my first po

[ceph-users] Re: Not able to start MDS after upgrade to 16.2.7

2022-02-11 Thread Izzy Kulbe
Hi, I've tried to rm the mds0.openfiles and did a `ceph config set mds mds_oft_prefetch_dirfrags false` but still with the same result of ceph status reporting the daemon as up and not a lot more. I also tried setting the cache to ridiculously small(128M) but the MDS' memory usage would still go

[ceph-users] MDS crash when unlink file

2022-02-11 Thread Arnaud MARTEL
Hi, MDSs are crashing on my production cluster when trying to unlink some files and I need help :-). When looking into the log files, I have identified some associated files and I ran a scrub on the parent directory with force,repair,recursive options. No error were detected but the problem p

[ceph-users] Re: slow pacific osd startup

2022-02-11 Thread Eugen Block
Hi, is there a difference in PG size on new and old OSDs or are they all similar in size? Is there some fsck enabled during OSD startup? Zitat von Andrej Filipcic : Hi, with 16.2.7, some OSDs are very slow to start, eg it takes ~30min for an hdd (12TB, 5TB used) to become active. After

[ceph-users] Re: osds won't start

2022-02-11 Thread Eugen Block
Can you share some more information how exactly you upgraded? It looks like a cephadm managed cluster. Did you intall OS updates on all nodes without waiting for the first one to recover? Maybe I'm misreading so please clarify what your update process looked like. Zitat von Mazzystr : I

[ceph-users] Re: Not able to start MDS after upgrade to 16.2.7

2022-02-11 Thread Dan van der Ster
Hi, Is the memory ballooning while the MDS is active or could it be while it is rejoining the cluster? If the latter, this could be another case of: https://tracker.ceph.com/issues/54253 Cheers, Dan On Wed, Feb 9, 2022 at 7:23 PM Izzy Kulbe wrote: > > Hi, > > last weekend we upgraded one of ou

[ceph-users] RBD map issue

2022-02-11 Thread Lo Re Giuseppe
Hi all, This is my first post to this user group, I’m not a ceph expert, sorry if I say/ask anything trivial. On a Kubernetes cluster I have an issue in creating volumes from a (csi) ceph EC pool. I can reproduce the problem from rbd cli like this from one of the k8s worker nodes: “”” root@f

[ceph-users] slow pacific osd startup

2022-02-11 Thread Andrej Filipcic
Hi, with 16.2.7, some OSDs are very slow to start, eg it takes ~30min for an hdd (12TB, 5TB used) to become active. After initialization, there is 20-40min of extreme reading at ~150MB/s from the OSD, just after --- Uptime(secs): 602.2 total, 0.0 interval Flush(GB): cumulative 0.101, int