[ceph-users] Re: ceph orch status hangs forever

2021-05-19 Thread Sebastian Luna Valero
Hi, Here it is: # cephadm shell -- ceph status Using recent ceph image 172.16.3.146:4000/ceph/ceph:v15.2.9 cluster: id: 3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c health: HEALTH_WARN 2 failed cephadm daemon(s) services: mon: 3 daemons, quorum

[ceph-users] fsck error: found stray omap data on omap_head

2021-05-19 Thread Pickett, Neale T
We just upgraded to pacific, and I'm trying to clear warnings about legacy bluestore omap usage stats by running 'ceph-bluestore-tool repair`, as instructed by the warning message. It's been going fine, but we are now getting this error: [root@vanilla bin]# ceph-bluestore-tool repair --path

[ceph-users] Re: Suitable 10G Switches for ceph storage - any recommendations?

2021-05-19 Thread Jeremy Austin
The CRS3xx series is no-frills but works. I have a 6-node cluster running just fine on them. Bonding should work also for 2x10G. Interestingly enough, MLAG was introduced in today's routeros v7 beta, but I'm skeptical that it will be stable yet. On Wed, May 19, 2021 at 1:22 AM Hermann

[ceph-users] Re: ceph orch status hangs forever

2021-05-19 Thread Eugen Block
Hi, can you paste the ceph status? The orchestrator is a MGR module, have you checked if the containers are up and running (assuming it’s cephadm based)? Do the logs also report the cluster as healthy? Zitat von Sebastian Luna Valero : Hi, After an unschedule power outage our Ceph

[ceph-users] ceph orch status hangs forever

2021-05-19 Thread Sebastian Luna Valero
Hi, After an unschedule power outage our Ceph (Octopus) cluster reports a healthy state with: "ceph status". However, when we run "ceph orch status" the command hangs forever. Are there other commands that we can run for a more thorough health check of the cluster? After looking at:

[ceph-users] Re: Suitable 10G Switches for ceph storage - any recommendations?

2021-05-19 Thread mj
Hi Hermann, Yes, I asked the same question a while ago, and received very valuable advice. We ended up purchasing dual refurb 40G arista's, for very little money compared to new 10G switches. Ours are these: https://emxcore.com/shop/category/product/arista-dcs-7050qx-32s/ The complete

[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?

2021-05-19 Thread Konstantin Shalygin
Dan, Igor Seems this wasn't backported? Get stored == used on luminous->Nautilus 14.2.21. What solution is? Find OSD's with zero bytes reported, drain->deploy it? Thanks, k Sent from my iPhone ___ ceph-users mailing list -- ceph-users@ceph.io To

[ceph-users] Re: rbd-nbd crashes Error: failed to read nbd request header: (33) Numerical argument out of domain

2021-05-19 Thread Mykola Golub
On Wed, May 19, 2021 at 11:32:04AM +0800, Zhi Zhang wrote: > On Wed, May 19, 2021 at 11:19 AM Zhi Zhang > wrote: > > > > > On Tue, May 18, 2021 at 10:58 PM Mykola Golub > > wrote: > > > > > > Could you please provide the full rbd-nbd log? If it is too large for > > > the attachment then may be

[ceph-users] Re: BlueFS spillover detected - 14.2.16

2021-05-19 Thread Konstantin Shalygin
Hi, Toby On 19 May 2021, at 15:24, Toby Darling wrote: > > In the last couple of weeks we've been getting BlueFS spillover warnings on > multiple (>10) osds, eg > > BLUEFS_SPILLOVER BlueFS spillover detected on 1 OSD(s) > osd.327 spilled over 58 MiB metadata from 'db' device (30 GiB used

[ceph-users] Re: Pool has been deleted before snaptrim finished

2021-05-19 Thread Szabo, Istvan (Agoda)
Also a question, how can I identify this issue? I have 3 high performing cluster with nvme for wal and rocksdb backed with ssds (3ssd/1nvme) and many months ago smashing the nvmes. We are using at as objectstore. I would assume the issue is the same as with our old cluster, but is there a way

[ceph-users] BlueFS spillover detected - 14.2.16

2021-05-19 Thread Toby Darling
Hi In the last couple of weeks we've been getting BlueFS spillover warnings on multiple (>10) osds, eg BLUEFS_SPILLOVER BlueFS spillover detected on 1 OSD(s) osd.327 spilled over 58 MiB metadata from 'db' device (30 GiB used of 66 GiB) to slow device I know this can be corrected with

[ceph-users] Re: Suitable 10G Switches for ceph storage - any recommendations?

2021-05-19 Thread Max Vernimmen
Hermann, I think there was a discussion on recommended switches not too long ago. You should be able to find it in the mailing list archives. I think the latency of the network is usually very minor compared to ceph's dependency on cpu and disk latency, so for a simple cluster I wouldn't worry

[ceph-users] Re: remove host from cluster for re-installing it

2021-05-19 Thread Eugen Block
Hi, the docs [1] cover that part, there's not much to it really. The easiest way is probably to export the current configuration (and keep it as backup): ceph orch ls --export --format yaml > cluster.yml Make a copy and then edit the yaml file, remove the node from the placement

[ceph-users] MDS process large memory consumption

2021-05-19 Thread Andres Rojas Guerrero
Hi all, I have observed that in a Nautilus (14.2.6) cluster the mds process in the MDS server is consuming a large amount of memory, for example in a MDS server with 128 GB of RAM I have observed the mds process it's consuming ~ 80 GB: ceph 20 0 78,8g 77,1g 13772 S 4,0 61,5

[ceph-users] Suitable 10G Switches for ceph storage - any recommendations?

2021-05-19 Thread Hermann Himmelbauer
Dear Ceph users, I am currently constructing a small hyperconverged Proxmox cluster with ceph as storage. So far I always had 3 nodes, which I directly linked together via 2 bonded 10G network interfaces for the Ceph storage, so I never needed any switching devices. This new cluster has more

[ceph-users] Re: MDS rank 0 damaged after update to 14.2.20

2021-05-19 Thread Eugen Block
Hi, I looked a little closer into what happened yesterday during the update, I'll summarize for documentation purposes, maybe it helps other users in this situation (sorry for the long post): The update process started at around 8 am. The MONs/MGRs updated successfully, I restarted the OSD

[ceph-users] Re: Process for adding a separate block.db to an osd

2021-05-19 Thread Boris Behrens
This helped: https://tracker.ceph.com/issues/44509 $ systemctl stop ceph-osd@68 $ ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-68 --devs-source /var/lib/ceph/osd/ceph-68/block --dev-target /var/lib/ceph/osd/ceph-68/block.db bluefs-bdev-migrate $ systemctl start ceph-osd@68 Thanks a lot for