[ceph-users] quay.io vs quay.ceph.io for container images

2021-09-07 Thread Linh Vu
Hi all, Seeking some clarifications regarding the new official ceph container registry... I used to deploy ceph in containers via ceph-ansible from the official docker.io registry (being default in ceph-ansible up to now) using the ceph/daemon image i.e this https://hub.docker.com/r/ceph/daemon

[ceph-users] Re: Data loss on appends, prod outage

2021-09-07 Thread Frank Schilder
Hi Nathan, > Is this the bug you are referring to? https://tracker.ceph.com/issues/37713 yes, its one of them. I believe there were more such reports. > The main prod filesystems are home > directories for hundreds of interactive users using clustered > machines, ... That's exactly what we

[ceph-users] Re: Data loss on appends, prod outage

2021-09-07 Thread Frank Schilder
Hi Nathan, could be a regression. The write append bug was a known issue for older kernel clients. I can try to find the link. We have one of the affected kernel versions and asked our users to use a single node for all writes to a file. In general, for distributed/parallel file systems, this

[ceph-users] Re: Edit crush rule

2021-09-07 Thread Budai Laszlo
Hi Rich, Nathan, Thank you for your answers. Yes, I'm aware of this option, but this is not changing the failure domain of an existing rule. I was wondering whether the CLI would permit that change. It looks like it doesn't. Thanks again for your time! Laszlo On 9/8/21 12:42 AM, Richard Bade

[ceph-users] Re: Edit crush rule

2021-09-07 Thread Richard Bade
Hi Budai, I agree with Nathan, just switch the crush rule. I've recently done this on one of our clusters. Create a new crush rule the same as your old one except with different failure domain. Then use: ceph osd pool set {pool_name} crush_rule {new_rule_name} Very easy. This may kick off some

[ceph-users] Re: Edit crush rule

2021-09-07 Thread Nathan Fish
I believe you would create a new rule and switch? On Tue, Sep 7, 2021 at 3:46 PM Budai Laszlo wrote: > > Dear all, > > is there a way to change the failure domain of a CRUSH rule using the CLI? > > I know I can do that by editing the crush map. I'm curious if there is a "CLI > way"? > > Thank

[ceph-users] Edit crush rule

2021-09-07 Thread Budai Laszlo
Dear all, is there a way to change the failure domain of a CRUSH rule using the CLI? I know I can do that by editing the crush map. I'm curious if there is a "CLI way"? Thank you, Laszlo ___ ceph-users mailing list -- ceph-users@ceph.io To

[ceph-users] Re: Data loss on appends, prod outage

2021-09-07 Thread Nathan Fish
Thank you for your reply. The main prod filesystems are home directories for hundreds of interactive users using clustered machines, so we cannot really control the write patterns. In the past, metadata performance has indeed been the bottleneck, but it was still quite fast enough. Is this the

[ceph-users] Data loss on appends, prod outage

2021-09-07 Thread Nathan Fish
As of this morning, when two CephFS clients append to the same file in quick succession, one append sometimes overwrites the other. This happens on some clients but not others; we're still trying to track down the pattern, if any. We've failed all production filesystems to prevent further data

[ceph-users] Prioritize backfill from one osd

2021-09-07 Thread ceph-users
Hi, I am removing a number of OSDs from my cluster. They have been marked out and the backfilling is progressing slowly but surely. What I would like to do at this point is actually finish one of the osds (at a time) so that I can remove that disk and shutdown that daemon. I have tried ceph

[ceph-users] Re: Kworker 100% with ceph-msgr (after upgrade to 14.2.6?)

2021-09-07 Thread Marc
Not really, I am currently using the fuse mount and decided not to use the cephfs for important things. I can remember that I got this after some update. With this update there was something you had to 'enable' which I did not do straightaway. But after enabling that, I started having these

[ceph-users] Re: MDS daemons stuck in resolve, please help

2021-09-07 Thread Frank Schilder
Hi Dan and Patrick, I collected some additional information trying the following: delete a snapshot, add a snapshot. My main concern was that no snaptrim operations would be executed. However, this is not the case. After removing a snapshot, the PGs on the respective pools started

[ceph-users] Cephadm not properly adding / removing iscsi services anymore

2021-09-07 Thread Paul Giralt (pgiralt)
This was working until recently and now seems to have stopped working. Running Pacific 16.2.5. When I modify the deployment YAML file for my iscsi gateways, the services are not being added or removed as requested. It’s as if the state is “stuck”. At one point I had 4 iSCSI gateways: 02, 03,

[ceph-users] debug RBD timeout issue

2021-09-07 Thread Tony Liu
Hi, I have OpenStack Ussuri and Ceph Octopus. Sometimes, I see timeout when create or delete volumes. I can see RBD timeout from cinder-volume. Has anyone seen such issue? I'd like to see what happens on Ceph. Which service should I look into? Is it stuck with mon or any OSD? Any option to

[ceph-users] Re: MDS daemons stuck in resolve, please help

2021-09-07 Thread Frank Schilder
Dear Dan, thanks for the fast reply! > ... when you set mds_recall_max_decay_rate there is a side effect that all > session recall_caps_throttle's are re-initialized OK, something like this could be a problem with the number of clients we have. I guess, next time I wait for a service window

[ceph-users] ceph progress bar stuck and 3rd manager not deploying

2021-09-07 Thread mabi
Hello I have a test ceph octopus 16.2.5 cluster with cephadm out of 7 nodes on Ubuntu 20.04 LTS bare metal. I just upgraded each node's kernel and performed a rolling reboot and now the ceph -s output is stuck somehow and the manager service is only deployed to two nodes instead of 3 nodes.

[ceph-users] Re: Kworker 100% with ceph-msgr (after upgrade to 14.2.6?)

2021-09-07 Thread Frank Schilder
Hi Marc, did you ever get a proper solution for this problem? We are having exactly the same issue, having snapshots on a file system leads to incredible performance degradation. I'm reporting some observations here (latest reply):

[ceph-users] Re: MDS daemons stuck in resolve, please help

2021-09-07 Thread Frank Schilder
Hi Dan, I think I need to be a bit more precise. When I do the following (mimic 13.2.10, latest): # ceph config dump | grep mds_recall_max_decay_rate # [no output] # ceph config get mds.0 mds_recall_max_decay_rate 2.50 # ceph config set mds mds_recall_max_decay_rate 2.5 # the MDS cluster

[ceph-users] Re: cephfs_metadata pool unexpected space utilization

2021-09-07 Thread Denis Polom
Hi any help here, please? I observe the same behavior on cluster I just updated with latest Octopus. Any help will be appreciated thx On 8/6/21 14:41, Denis Polom wrote: Hi, I observe strange behavior on my Ceph MDS cluster, where cephfs_metadata pool is filling out without obvious

[ceph-users] CentOS Linux 8 EOL

2021-09-07 Thread Dan van der Ster
Dear friends, We wanted to clarify the plans / expectations for when CentOS Linux 8 reaches EOL at the end of this year. The default plan for our prod clusters is to upgrade servers in place from Linux 8.4 to Stream 8. (We already started this a couple months ago, ran into a couple minor fixable

[ceph-users] Re: Kworker 100% with ceph-msgr (after upgrade to 14.2.6?)

2021-09-07 Thread Sebastian Knust
Hi, I too am still suffering the same issue (snapshots lead to 100% ceph-msgr usage on client during metadata-intensive operations like backup and rsync) and had previously reported it to this list. This issue is also tracked at https://tracker.ceph.com/issues/44100 My current observations:

[ceph-users] Re: MDS daemons stuck in resolve, please help

2021-09-07 Thread Dan van der Ster
Hi, On Tue, Sep 7, 2021 at 1:55 PM Frank Schilder wrote: > > Hi Dan, > > I think I need to be a bit more precise. When I do the following (mimic > 13.2.10, latest): > > # ceph config dump | grep mds_recall_max_decay_rate > # [no output] > # ceph config get mds.0 mds_recall_max_decay_rate >

[ceph-users] RGW: Handling of ' ' , +, %20,and %2B in Filenames

2021-09-07 Thread Ingo Reimann
Hi, we observed a strange behaviour: Ruby GEMs fog + carrierwave handle URI different, when performing a GET or DELETE Request: GET: Blanks are coded as '%20' DELETE: Blanks are coded as '+' This behaviour is bad, because Ceph treats both files as different, but returns "204 No Content" on

[ceph-users] Re: Problem mounting cephfs Share

2021-09-07 Thread Hendrik Peyerl
It turned out that I ran into an issue with a very recent Kernel (5.13). I only had ms_bind_ipv6 = true in my ceph.conf but apparently I also needed ms_bind_ipv4 = false. Took me a while to get to that result ;) > On 7. Sep 2021, at 13:21, Eugen Block wrote: > > Could you share the exact

[ceph-users] Re: Problem mounting cephfs Share

2021-09-07 Thread Eugen Block
Could you share the exact command you're trying and then also 'ceph auth get client.'? Zitat von Hendrik Peyerl : Hi Eugen, thanks for the idea but i didn’t have anything mounted that i could unmount On 6. Sep 2021, at 09:15, Eugen Block wrote: Hi, I just got the same message in my

[ceph-users] Re: Brand New Cephadm Deployment, OSDs show either in/down or out/down

2021-09-07 Thread nORKy
Hi, Thanks you Sebastian, create the folder /usr/lib/sysctl.d fix the bug ! So, it's a Debian specific bug 'Jof Le jeu. 2 sept. 2021 à 10:52, Sebastian Wagner a écrit : > Can you verify that the `/usr/lib/sysctl.d/` folder exists on your > debian machines? > > Am 01.09.21 um 15:19 schrieb

[ceph-users] cephadm sysctl-dir parameter does not affect location of /usr/lib/sysctl.d/90-ceph-${fsid}-osd.conf

2021-09-07 Thread Gosch, Torsten
Hello, I'm attempting to run the latest Ceph release (v16.2.5) on Fedora CoreOS (stable release 34.20210808.3.0). I was yet able to manage bootstrapping the cluster without any significant problems. But unfortunately one thing prevents further progress. The OSD containers won't start because

[ceph-users] Re: Performance optimization

2021-09-07 Thread Robert Sander
Am 07.09.21 um 11:49 schrieb Simon Sutter: I never looked into RocksDB, because I thought writing data 24/7 does not benefit from caching. But this is metadata storage, so I might profit from it. Due to lack of sata ports, is it possible to save all RocksDB's on one ssd? It should still be

[ceph-users] Re: Performance optimization

2021-09-07 Thread Simon Sutter
I never looked into RocksDB, because I thought writing data 24/7 does not benefit from caching. But this is metadata storage, so I might profit from it. Due to lack of sata ports, is it possible to save all RocksDB's on one ssd? It should still be faster, to write it to just one ssd, instead

[ceph-users] Re: New Pacific deployment, "failed to find osd.# in keyring" errors

2021-09-07 Thread nORKy
Hi Same problem here too 'orch apply osd --all-available-devices' cephadm 2021-09-07T09:12:34.256134+ mgr.POC-568.iozqlk (mgr.44107) 499 : cephadm [ERR] executing create_from_spec_one(([('POC-569', a écrit : > I'm trying to bring up a new cluster, just installed, and I'm getting > errors

[ceph-users] Re: Drop of performance after Nautilus to Pacific upgrade

2021-09-07 Thread Martin Mlynář
Hello, we've noticed similar issue after upgrading our test 3 node cluster from 15.2.14-1~bpo10+1 to 16.1.0-1~bpo10+1. quick tests using rados bench: 16.2.5-1~bpo10+1: Total time run:         133.28 Total writes made:      576 Write size:             4194304 Object size:            4194304