[ceph-users] Re: OSD stop and fails

2021-08-30 Thread Amudhan P
Gregory, I have raised a ticket already. https://tracker.ceph.com/issues/52445 Amudhan On Tue, Aug 31, 2021 at 12:00 AM Gregory Farnum wrote: > Hmm, this ceph_assert hasn't shown up in my email before. It looks > like there may be a soft-state bug in Octopus. Can you file a ticket > at

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-30 Thread Paul Giralt (pgiralt)
On Aug 30, 2021, at 7:14 PM, Xiubo Li mailto:xiu...@redhat.com>> wrote: We are using “Most Recently Used” - however there are 50 ESXi hosts all trying to access the same data stores, so it’s very possible that one host is choosing iSCSI gateway 1 and another host is choosing iSCSI gateway

[ceph-users] Cephadm cannot aquire lock

2021-08-30 Thread fcid
Hi ceph community, I'm having some trouble trying to delete an OSD. I've been using cephadm in one of our clusters and it's works fine, but lately, after an OSD failure, I cannot delete it using the orchestrator. Since the orchestrator is not working (for some unknown reason) I tried to

[ceph-users] Network issues with a CephFS client mount via a Cloudstack instance

2021-08-30 Thread Jeremy Hansen
I’m going to also post this to the Cloudstack list as well. Attempting to rsync a large file to the Ceph volume, the instance becomes unresponsive at the network level. It eventually returns but it will continually drop offline as the file copies. Dmesg shows this on the Cloudstack host

[ceph-users] Re: MDS daemons stuck in resolve, please help

2021-08-30 Thread Frank Schilder
The MDS cluster came back up again, but I lost a number of standby MDS daemons. I cleared the OSD blacklist, but they do not show up as stand-by daemons again. The daemon itself is running, but does not seem to re-join the cluster. The log shows: 2021-08-30 21:32:34.896 7fc9e22f8700 1

[ceph-users] MDS daemons stuck in resolve, please help

2021-08-30 Thread Frank Schilder
Hi all, our MDS cluster got degraded after an MDS had an oversized cache and crashed. Other MDS daemons followed suit and now they are stuck in this state: [root@gnosis ~]# ceph fs status con-fs2 - 1640 clients === +--+-+-+---+---+---+ | Rank | State

[ceph-users] Re: mon startup problem on upgrade octopus to pacific

2021-08-30 Thread Chris Dunlop
Hi, Does anyone have any suggestions? Thanks, Chris On Mon, Aug 30, 2021 at 03:52:29PM +1000, Chris Dunlop wrote: Hi, I'm stuck, mid upgrade from octopus to pacific using cephadm, at the point of upgrading the mons. I have 3 mons still on octopus and in quorum. When I try to bring up a

[ceph-users] Re: Howto upgrade AND change distro

2021-08-30 Thread Reed Dier
I think it will depend on how you have your OSDs deployed currently. If they are bluestore deployed via ceph-volume using LVM, then it should mostly be pretty painless to migrate them to a new host, assuming everything is on the OSDs. The corner case would be if the WAL/DB is on a separate

[ceph-users] Re: radosgw manual deployment

2021-08-30 Thread Francesco Piraneo G.
Hi Eugen, everything worked fine on my test until I decided to move the RADOS gateway under a different host than mon. In such case the dashboard is no longer able to find the RADOS gateway daemon; on my dashboard I have this message: The Object Gateway Service is not configured No RGW

[ceph-users] Re: Howto upgrade AND change distro

2021-08-30 Thread Michal Strnad
Hi Fleg. We have reinstalled bunch of ceph nodes. You have to basically do following steps: 1. Back up /etc and /var/lib directories with configurations 2. Reinstall server with new OS (for example CentOS 8 Stream) and use the same disks 3. Install ceph packages and restore /etc/ceph and

[ceph-users] Re: OSD stop and fails

2021-08-30 Thread Gregory Farnum
Hmm, this ceph_assert hasn't shown up in my email before. It looks like there may be a soft-state bug in Octopus. Can you file a ticket at tracker.ceph.com with the bcaktrace and osd log file? We can direct that to the RADOS team to check out. -Greg On Sat, Aug 28, 2021 at 7:13 AM Amudhan P

[ceph-users] Re: LARGE_OMAP_OBJECTS: any proper action possible?

2021-08-30 Thread Frank Schilder
Dear Dan and Patrick, I have the suspicion that I'm looking at large directories in the snapshots that do no longer exist any more on the file system. Hence, the omap objects are not fragmented as explained in the tracker issue. Here is the info as you asked me to pull out: > find /cephfs

[ceph-users] ceph orch commands stuck

2021-08-30 Thread Oliver Weinmann
Hi, we had one failed osd in our cluster that we have replaced. Since then the cluster is behaving very strange and some ceph commands like ceph crash or ceph orch are stuck. Cluster health: [root@gedasvl98 ~]# ceph -s   cluster:     id: ec9e031a-cd10-11eb-a3c3-005056b7db1f    

[ceph-users] Re: Missing OSD in SSD after disk failure

2021-08-30 Thread David Orman
I may have misread your original email, for which I apologize. If you do a 'ceph orch device ls' does the NVME in question show available? On that host with the failed OSD, if you lvs/lsblk do you see the old DB on the NVME still? I'm not sure if the replacement process you followed will work.

[ceph-users] Ceph User Survey 2022 Planning

2021-08-30 Thread Mike Perez
Hi all, It's that time again for us to plan the next Ceph User Survey for 2022! Here's our 2021 survey and questions: https://tracker.ceph.com/attachments/download/5378/Ceph%20User%20Survey%202021.pdf Here are the 2021 results: https://ceph.io/en/news/blog/2021/2021-ceph-user-survey-results/

[ceph-users] Re: Howto upgrade AND change distro

2021-08-30 Thread Francois Legrand
Thanks, My point is how to reattach safely an osd from the previous server to the new installed distro ! Is there a detailed howto réinstall completely a server (or a cluster) ? F. Le 27/08/2021 à 19:47, Message: 1 Date: Fri, 27 Aug 2021 16:43:12 +0100 From: Matthew Vernon Subject:

[ceph-users] Re: Brand New Cephadm Deployment, OSDs show either in/down or out/down

2021-08-30 Thread Alcatraz
Sebastian, Thanks for responding! And of course. 1. ceph orch ls --service-type osd --format yaml Output: service_type: osd service_id: all-available-devices service_name: osd.all-available-devices placement:   host_pattern: '*' unmanaged: true spec:   data_devices:     all: true  

[ceph-users] cephadm Pacific bootstrap hangs waiting for mon

2021-08-30 Thread Matthew Pounsett
I'm just getting started with Pacific, and I've run into this problem trying to get bootstrapped. cephadm is waiting for the mon to start, and waiting, and waiting ... checking docker ps it looks like it's running, but I guess it's never finishing its startup tasks? I waited about 30 minutes

[ceph-users] Re: Very beginner question for cephadm: config file for bootstrap and osd_crush_chooseleaf_type

2021-08-30 Thread Sebastian Wagner
Try running `cephadm bootstrap --single-host-defaults` Am 20.08.21 um 18:23 schrieb Eugen Block: Hi, you can just set the config option with 'ceph config set ...' after your cluster has been bootstrapped. See [1] for more details about the config store. [1]

[ceph-users] Re: Brand New Cephadm Deployment, OSDs show either in/down or out/down

2021-08-30 Thread Sebastian Wagner
Could you run 1. ceph orch ls --service-type osd --format yaml 2. cpeh orch ps --daemon-type osd --format yaml 3. try running the `ceph auth add` call form https://docs.ceph.com/en/mimic/rados/operations/add-or-rm-osds/#adding-an-osd-manual Am 30.08.21 um 14:49 schrieb Alcatraz: Hello

[ceph-users] Re: ceph orch commands stuck

2021-08-30 Thread Burkhard Linke
Hi, On 30.08.21 15:36, Oliver Weinmann wrote: Hi, we had one failed osd in our cluster that we have replaced. Since then the cluster is behaving very strange and some ceph commands like ceph crash or ceph orch are stuck. Just two unrelated thoughts: - never use two mons. If one of

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-30 Thread Paul Giralt (pgiralt)
Inline… Usually it shouldn't be so many entries, and we have added some patches to fix this. When the exclusive lock is broke by a new gateway, the previous one will be added to the blocklist by ceph. And in tcmu-runner when the previous gateway detect that it has been added to the blocklist

[ceph-users] Re: Adding a new monitor causes cluster freeze

2021-08-30 Thread Daniel Nagy (Systec)
Dan, you saved the day, that tunable helped, now we have 3 mons again. Thanks You! We will definitely upgrade to nautilus at least. Thanks again! From: Dan van der Ster Sent: Monday, August 30, 2021 10:48 To: Daniel Nagy (Systec) Cc: ceph-users@ceph.io

[ceph-users] Brand New Cephadm Deployment, OSDs show either in/down or out/down

2021-08-30 Thread Alcatraz
Hello all, Running into some issues trying to build a virtual PoC for Ceph. Went to my cloud provider of choice and spun up some nodes. I have three identical hosts consisting of: Debian 10 8 cpu cores 16GB RAM 1x315GB Boot Drive 3x400GB Data drives After deploying Ceph (v 16.2.5) using

[ceph-users] Re: Adding a new monitor causes cluster freeze

2021-08-30 Thread Daniel Nagy (Systec)
During deployment we checked https://docs.ceph.com/en/mimic/start/os-recommendations/ which recommends at least 4.x kernel-ml repo already had 5.x that time, so we chose that instead. From: Szabo, Istvan (Agoda) Sent: Monday, August 30, 2021 10:36 To: Daniel

[ceph-users] Re: Adding a new monitor causes cluster freeze

2021-08-30 Thread Szabo, Istvan (Agoda)
Any reason to use kernel 5 rather than 3? Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com --- On 2021. Aug 30., at

[ceph-users] Adding a new monitor causes cluster freeze

2021-08-30 Thread Daniel Nagy (Systec)
Hi, We have a mimic cluster (I know it is EOL, but cannot upgrade because of the following issue...) with 3 mons. One of them was rebooted and cannot join back. When it starts, the whole cluster is 'stuck', until I kill the joining mon process. Even a 'ceph -s' cannot be run during that period

[ceph-users] Re: rbd-nbd crashes Error: failed to read nbd request header: (33) Numerical argument out of domain

2021-08-30 Thread Ilya Dryomov
On Mon, Aug 30, 2021 at 1:06 PM Yanhu Cao wrote: > > Hi Ilya, > > Recently, we found these patches(v2) > http://archive.lwn.net:8080/linux-kernel/YRHa%2FkeJ4pHP3hnL@T590/T/. > Maybe related? > > v3: > https://lore.kernel.org/linux-block/20210824141227.808340-2-yuku...@huawei.com/ It doesn't

[ceph-users] Re: rbd-nbd crashes Error: failed to read nbd request header: (33) Numerical argument out of domain

2021-08-30 Thread Yanhu Cao
Hi Ilya, Recently, we found these patches(v2) http://archive.lwn.net:8080/linux-kernel/YRHa%2FkeJ4pHP3hnL@T590/T/. Maybe related? v3: https://lore.kernel.org/linux-block/20210824141227.808340-2-yuku...@huawei.com/ On Mon, Aug 30, 2021 at 6:34 PM Ilya Dryomov wrote: > > On Tue, Aug 24, 2021 at

[ceph-users] Re: A practical approach to efficiently store 100 billions small objects in Ceph

2021-08-30 Thread Loïc Dachary
Bonjour, In the past months benchmarks were written[0] and run in the grid5000 cluster[1] to verify there was no blocker with the proposed approach[2]. The results were published today[3]. In the following weeks the Software Heritage codebase will be modified to implement the design[4].

[ceph-users] Re: rbd-nbd crashes Error: failed to read nbd request header: (33) Numerical argument out of domain

2021-08-30 Thread Ilya Dryomov
On Tue, Aug 24, 2021 at 11:43 AM Yanhu Cao wrote: > > Any progress on this? We have encountered the same problem, use the > rbd-nbd option timeout=120. > ceph version: 14.2.13 > kernel version: 4.19.118-2+deb10u1 Hi Yanhu, No, we still don't know what is causing this. If rbd-nbd is being too

[ceph-users] Re: Replacing swift with RGW

2021-08-30 Thread Etienne Menguy
Hi, There are some information on Ceph documentation https://docs.ceph.com/en/latest/radosgw/keystone/ . - Use keystone as auth for RGW - Create service and register your RGW as swift Étienne > On 27 Aug 2021, at 15:47, Michel Niyoyita

[ceph-users] Re: Adding a new monitor causes cluster freeze

2021-08-30 Thread Dan van der Ster
This sounds a lot like: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/M5ZKF7PTEO2OGDDY5L74EV4QS5SDCZTH/ See the discussion about mon_sync_max_payload_size, and the PR that fixed this at some point in nautilus. (https://github.com/ceph/ceph/pull/31581) It probably was never fixed

[ceph-users] Fwd: [lca-announce] linux.conf.au 2022 - Call for Sessions now open!

2021-08-30 Thread Tim Serong
One week left for talk submissions for linux.conf.au 2022 (virtualized for the second time, January 14-16 2022). Forwarded Message Subject: [lca-announce] linux.conf.au 2022 - Call for Sessions now open! Date: Tue, 10 Aug 2021 08:07:01 +1000 From: linux.conf.au Announcements