[ceph-users] New Issue - Mapping Block Devices

2021-03-22 Thread duluxoz
Hi All, I've got a new issue (hopefully this one will be the last). I have a working Ceph (Octopus) cluster with a replicated pool (my-pool), an erasure-coded pool (my-pool-data), and an image (my-image) created - all *seems* to be working correctly. I also have the correct Keyring specified

[ceph-users] How to know which client hold the lock of a file

2021-03-22 Thread Norman.Kern
Hi, Anyone knows how to know which client hold lock of a file in Ceph fs? I met a dead lock problem that a client holding on get the lock, but I don't kown which client held it. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Sam Skipsey
Hi Dan: Aha - I think the first commit is probably it - before that commit, the fact that lo is highest in the interfaces enumeration didn't matter for us [since it would always be skipped]. This actually almost certainly also is associated with that other site with a similar problem (OSDs drop

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Dan van der Ster
There are two commits between 14.2.16 and 14.2.18 related to loopback network. Perhaps one of these is responsible for your issue [1]. I'd try playing with the options like cluster/public bind addr and cluster/public bind interface until you can convince the osd to bind to the correct listening

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Sam Skipsey
I don't think we explicitly set any ms settings in the OSD host ceph.conf [all the OSDs ceph.confs are identical across the entire cluster]. ip a gives: ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Dan van der Ster
Which `ms` settings do you have in the OSD host's ceph.conf or the ceph config dump? And how does `ip a` look on one of these hosts where the osd is registering itself as 127.0.0.1? You might as well set nodown again now. This will make ops pile up, but that's the least of your concerns at the

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Sam Skipsey
Hm, yes it does [and I was wondering why loopbacks were showing up suddenly in the logs]. This wasn't happening with 14.2.16 so what's changed about how we specify stuff? This might correlate with the other person on the IRC list who has problems with 14.2.18 and their OSDs deciding they don't

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Dan van der Ster
What's with the OSDs having loopback addresses? E.g. v2: 127.0.0.1:6881/17664667,v1:127.0.0.1:6882/17664667 Does `ceph osd dump` show those same loopback addresses for each OSD? This sounds familiar... I'm trying to find the recent ticket. .. dan On Mon, Mar 22, 2021, 6:07 PM Sam Skipsey

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Sam Skipsey
hi Dan: So, unsetting nodown results in... almost all of the OSDs being marked down. (231 down out of 328). Checking the actual OSD services, most of them were actually up and active on the nodes, even when the mons had marked them down. (On a few nodes, the down services corresponded to OSDs

[ceph-users] DocuBetter Meeting -- APAC 25 Mar 2021 0100 UTC

2021-03-22 Thread John Zachary Dover
There will be a DocuBetter meeting on Thursday, 25 Mar 2021 at 0100 UTC. We will discuss the Google Season of Docs proposal (the Comprehensive Contribution Guide), the rewriting of the cephadm documentation and the new sectin of the Teuthology Guide. DocuBetter Meeting -- APAC 25 Mar 2021 0100

[ceph-users] March 2021 Tech Talk and Code Walk-through

2021-03-22 Thread Mike Perez
Hi everyone! I'm excited to announce two talks we have on the schedule for March 2021: Persistent Bucket Notifications By Yuval Lifshitz https://ceph.io/ceph-tech-talks/ The stream starts on March 25th at 17:00 UTC / 18:00 CET / 1:00 PM EST / 10:00 AM PST Persistent bucket notifications are

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Dan van der Ster
Hi, I would unset nodown (hiding osd failures) and norecover (blcoking PGs from recovering degraded objects), then start starting osds. As soon as you have some osd logs reporting some failures, then share those... - Dan On Mon, Mar 22, 2021 at 3:49 PM Sam Skipsey wrote: > > So, we started the

[ceph-users] Device class not deleted/set correctly

2021-03-22 Thread Nico Schottelius
Hello, follow up from my mail from 2020 [0], it seems that OSDs sometimes have "multiple classes" assigned: [15:47:15] server6.place6:/var/lib/ceph/osd/ceph-4# ceph osd crush rm-device-class osd.4 done removing class of osd(s): 4 [15:47:17] server6.place6:/var/lib/ceph/osd/ceph-4# ceph osd

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Sam Skipsey
So, we started the mons and mgr up again, and here's the relevant logs, including also ceph versions. We've also turned off all of the firewalls on all of the nodes so we know that there can't be network issues [and, indeed, all of our management of the OSDs happens via logins from the service

[ceph-users] Ceph User Survey Working Group - Next Steps

2021-03-22 Thread Mike Perez
Hi everyone, We are approaching the April 2nd deadline in two weeks, so we should start proposing the next meeting to plan the survey results. Anybody in the community is welcome to join the Ceph Working Groups. Please add your name to: https://ceph.io/user-survey/ I have started a doodle:

[ceph-users] how to disable write-back mode in ceph octopus

2021-03-22 Thread 无名万剑归宗
I tried cache tier in write-back mode in my cluster, but because my ssd drive is home used, can not satisfy the needs of IOPS. Now I want disable write-back mode , I founded office documents,but the doc was outdated

[ceph-users] Re: Question about migrating from iSCSI to RBD

2021-03-22 Thread Justin Goetz
Hey Rich! Appreciate the info. This did work successfully! Just wanted to share my experience in case others run into a similar situation: First step, I disabled the tcmu-runner process on all 3 of our previous iSCSI gateway nodes. Then from our MONs, I confirmed there were no current locks

[ceph-users] Re: [Suspicious newsletter] RGW: Multiple Site does not sync olds data

2021-03-22 Thread 特木勒
Thank you~ I will try to upgrade cluster too. Seem like this is the only way for now.  I will let you know once I complete testing. :) Have a good day Szabo, Istvan (Agoda) 于2021年3月22日 周一下午3:38写道: > Yeah, doesn't work. Last week they fixed my problem ticket which caused > the crashes, and

[ceph-users] Re: Incomplete pg , any chance to to make it survive or data loss :( ?

2021-03-22 Thread Szabo, Istvan (Agoda)
Some news, due to the ceph pg inactive list command gave back that 0 objects are in this pg, I've marked complete on the primary osd, now it is unfound. Now I've stucked again  [WRN] OBJECT_UNFOUND: 4/58369044 objects unfound (0.000%) pg 44.1aa has 4 unfound objects [ERR] PG_DAMAGED:

[ceph-users] Re: How to sizing nfs-ganesha.

2021-03-22 Thread Daniel Gryniewicz
Hi. Unfortunately, there isn't a good guide for sizing Ganesha. It's pretty light weight, and so the machines it needs are generally smaller than what Ceph needs, so you probably won't have much of a problem. The scaling of Ganesha is in 2 factors, based on the workload involved: the CPU

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Sam Skipsey
Hi Dan: Thanks for the reply - at present, our mons and mgrs are off [because of the unsustainable nature of the filesystem usage]. We'll try putting them on again for long enough to get "ceph status" out of them, but because the mgr was unable to actually talk to anything, and reply at that

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Dan van der Ster
Hi Sam, The daemons restart (for *some* releases) because of this: https://tracker.ceph.com/issues/21672 In short, if the selinux module changes, and if you have selinux enabled, then midway through yum update, there will be a systemctl restart ceph.target issued. For the rest -- I think you

[ceph-users] Incomplete pg , any chance to to make it survive or data loss :( ?

2021-03-22 Thread Szabo, Istvan (Agoda)
Hi, What can I do with this pg to make it work? We lost and don't have the osds 61,122 but we have the 32,33,70. I've exported the pg chunk from them, but they are very small and when I imported back to another osd that osd never started again so I had to remove that chunk (44.1aas2,

[ceph-users] Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Sam Skipsey
Hi everyone: I posted to the list on Friday morning (UK time), but apparently my email is still in moderation (I have an email from the list bot telling me that it's held for moderation but no updates). Since this is a bit urgent - we have ~3PB of storage offline - I'm posting again. To save

[ceph-users] Re: [Suspicious newsletter] RGW: Multiple Site does not sync olds data

2021-03-22 Thread 特木勒
Hi Istvan: Do you have any update on directional sync? I am trying to upgrade cluster to 15.2.10 to see if the problem is solved. :( Thanks Szabo, Istvan (Agoda) 于2021年3月1日周一 上午10:01写道: > So-so. I had some interruption so it failed on one site, but the other is > kind of working. This is the

[ceph-users] Re: [Suspicious newsletter] RGW: Multiple Site does not sync olds data

2021-03-22 Thread 特木勒
Hi Istvan: Any update for directional sync? I am trying to upgrade Ceph to 15.2.10. Also I have some issues that RGW may crash after I ran data sync init. :( Thanks ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to