[ceph-users] ERROR: S3 error: 403 (SignatureDoesNotMatch)

2021-03-11 Thread Szabo, Istvan (Agoda)
Hi, I'm struggling with my old cluster cnamed address. The s3 and curl commands are working properly with the not cnamed address, but with the cnamed one, I got this in the ciwetweb log: 2021-03-12 10:24:18.812329 7f6b0c527700 1 == starting new request req=0x7f6b0c520f90 = 2021-03-12

[ceph-users] Re: Ceph server

2021-03-11 Thread Ignazio Cassano
Many thanks Ignazio Il Ven 12 Mar 2021, 00:04 Reed Dier ha scritto: > I'm going to echo what Stefan said. > > I would ditch the 2x SATA drives to free up your slots. > Replace with an M.2 or SATADOM. > > I would also recommend moving from the 2x X710-DA2 cards to 1x X710-DA4 > card. > It can't

[ceph-users] Re: how to tell balancer to balance

2021-03-11 Thread Boris Behrens
Hi Joe, I've tested to create a plan on my own, but I still get the same message (Error EALREADY: Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect). What I've also tried to reweight 0.8 the three most filled OSDs, which worked good. After

[ceph-users] Question about delayed write IOs, octopus, mixed storage

2021-03-11 Thread Philip Brown
I'm running some tests with mixed storage units, and octopus. 8 nodes, each with 2 SSDs, and 8 HDDs . the SSDsare relatively small: around 100GB each. Im mapping 8 rbds, striping them together, and running fio on them for testing. # fio --filename=/./fio.testfile --size=120GB --rw=randrw

[ceph-users] Unhealthy Cluster | Remove / Purge duplicate osds | Fix daemon

2021-03-11 Thread Oliver Weinmann
Hi, On my 3 node Octopus 15.2.5 test cluster, that I haven't used for quite a while, I noticed that it shows some errors: [root@gedasvl02 ~]# ceph health detail INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af INFO:cephadm:Inferring config

[ceph-users] Re: [External Email] Re: Re: Failure Domain = NVMe?

2021-03-11 Thread Dave Hall
Steven, In my current hardware configurations each NVMe supports multiple OSDs. In my earlier nodes it is 8 OSDs sharing one NVMe (which is also too small). In the near term I will add NVMe to those nodes, but I'll still have 5 OSDs some OSDs, and 2 or 3 on all the others. So an NVMe failure

[ceph-users] Re: Ceph server

2021-03-11 Thread Reed Dier
I'm going to echo what Stefan said. I would ditch the 2x SATA drives to free up your slots. Replace with an M.2 or SATADOM. I would also recommend moving from the 2x X710-DA2 cards to 1x X710-DA4 card. It can't saturate the x8 slot, and it frees up a PCIe slot for possibly another NVMe card or

[ceph-users] Re: Failure Domain = NVMe?

2021-03-11 Thread Dave Hall
Hello, While I appreciate and acknowledge the concerns regarding host failure and maintenance shutdowns, our main concern at this time is data loss. Our use case at this time allows for suspension of client I/0 and/or for full cluster shutdown for maintenance, but loss of data would be

[ceph-users] Re: Best way to add OSDs - whole node or one by one?

2021-03-11 Thread Reed Dier
I'm sure there is a "correct" way, but I think it mostly relates to how busy your cluster is, and how tolerant it is of the added load from the backfills. My current modus operandi is to set the noin, noout, nobackfill, norecover, and norebalance flags first. This makes sure that new OSDs don't

[ceph-users] Re: OSD id 241 != my id 248: conversion from "ceph-disk" to "ceph-volume simple" destroys OSDs

2021-03-11 Thread Chris Dunlop
Hi Frank, I agree there's a problem there. Howewever, to clarify: the json file already contains the /dev/sdq1 path (at data:path) and the "simple activate" is just reading the file. I.e. the problem lies with the json file creator, which was the "ceph-volume simple scan" step. For fix your

[ceph-users] v14.2.17 Nautilus released

2021-03-11 Thread David Galloway
Body: We're happy to announce the 17th backport release in the Nautilus series. We recommend users to update to this release. For a detailed release notes with links & changelog please refer to the official blog entry at https://ceph.io/releases/v14-2-17-nautilus-released Notable Changes

[ceph-users] Re: [External Email] Re: Re: Failure Domain = NVMe?

2021-03-11 Thread Steven Pine
Setting the failure domain to host will accomplish nearly the same goal and provide better results during maintenance, host reboots, and of course host failures. But otherwise you can try manually creating crush maps and map a domain failure to nvme and the osds under it, but the additional work

[ceph-users] Can FS snapshots cause factor 3 performance loss?

2021-03-11 Thread Frank Schilder
Hi all, we are observing a dramatic performance drop on our ceph file system and are wondering if this could be related to ceph fs snapshots. We are taking rotating snapshots in 2 directories and have 11 snapshots in each (ls below) as of today. We observe the performance drop with an rsync

[ceph-users] Re: [External Email] Re: Re: Failure Domain = NVMe?

2021-03-11 Thread Marc
> In my current hardware configurations each NVMe supports multiple OSDs. > In > my earlier nodes it is 8 OSDs sharing one NVMe (which is also too > small). > In the near term I will add NVMe to those nodes, but I'll still have 5 > OSDs > some OSDs, and 2 or 3 on all the others. So an NVMe

[ceph-users] Re: mon db growing. over 500Gb

2021-03-11 Thread ricardo.re.azevedo
HI Andreas, That's good to know. I managed to fix the problem! Here is my journey in case it helps anyone: My system drives are only 512GB so I added spare 1Tb drives to each server and moved the mon db to the new drive. I set noout, nobackfill and norecover and enabled only the ceph mon and osd

[ceph-users] Re: Failure Domain = NVMe?

2021-03-11 Thread Steven Pine
Setting domain failure on a per node basis will prevent data loss in the case of an nvme failure, you would need multiple nvme failures across different hosts. If data loss is the primary concern then again, you will want a higher EC ratio, 6:3 or 6:4 but with only 6 osds, then 4:2 or even 3:3, or

[ceph-users] Re: Cephadm: Upgrade 15.2.5 -> 15.2.9 stops on non existing OSD

2021-03-11 Thread Sebastian Wagner
yes Am 11.03.21 um 15:46 schrieb Kai Stian Olstad: > Hi Sebastian > > On 11.03.2021 13:13, Sebastian Wagner wrote: >> looks like >> >> $ ssh pech-hd-009 >> # cephadm ls >> >> is returning this non-existent OSDs. >> >> can you verify that `cephadm ls` on that host doesn't >> print osd.355 ? > >

[ceph-users] Re: Alertmanager not using custom configuration template

2021-03-11 Thread Marc 'risson' Schmitt
Quick follow-up on this, On Thu, 11 Mar 2021 14:58:41 +0100 Marc 'risson' Schmitt wrote: > > Indeed. I just merged https://github.com/ceph/ceph/pull/39932 > > which fixes the names of those config keys. Cephadm is supposed to include some default Prometheus configuration for alerting[1], if

[ceph-users] Re: Failure Domain = NVMe?

2021-03-11 Thread Christian Wuerdig
For EC 8+2 you can get away with 5 hosts by ensuring each host gets 2 shards similar to this: https://ceph.io/planet/erasure-code-on-small-clusters/ If a host dies/goes down you can still recover all data (although at that stage your cluster is no longer available for client io). You shouldn't

[ceph-users] Re: OSD id 241 != my id 248: conversion from "ceph-disk" to "ceph-volume simple" destroys OSDs

2021-03-11 Thread Frank Schilder
Hi Chris, I found the problem. "ceph-volume simple activate" modifies the OSD's meta data in an invalid way. On a pre lvm-converted ceph-disk OSD I had in my cupboard: [root@ceph-adm:ceph-20 ~]# mount /dev/sdq1 mnt [root@ceph-adm:ceph-20 ~]# ls -l mnt [...] lrwxrwxrwx. 1 ceph ceph 58 Mar 15

[ceph-users] Re: Container deployment - Ceph-volume activation

2021-03-11 Thread 胡 玮文
Hi, Assuming you are using cephadm? Checkout this https://docs.ceph.com/en/latest/cephadm/osd/#activate-existing-osds ceph cephadm osd activate ... 在 2021年3月11日,23:01,Cloud Guy 写道: Hello, TL;DR Looking for guidance on ceph-volume lvm activate --all as it would apply to a containerized

[ceph-users] Re: Failure Domain = NVMe?

2021-03-11 Thread Steven Pine
One potential issue is maintenance after a nvme failure. Depending on how the hardware is configured, you will need to bring the whole node down to replace the failed nvme, which could cause PG to become read only if you are close to your min threshold. I think the additional risk is not worth it,

[ceph-users] Re: OSDs crashing after server reboot.

2021-03-11 Thread Cassiano Pilipavicius
Hi, really this error was only showing up when I've tried to run ceph-bluestore-tool repair, In my 3 OSDs that keeps crashing, it show the following log... please let me know if there is something I can do to get the pool back to a functioning state. Uptime(secs): 0.0 total, 0.0 interval

[ceph-users] Re: cephadm (curl master)/15.2.9:: how to add orchestration

2021-03-11 Thread Sebastian Wagner
Hi Adrian, Am 11.03.21 um 13:55 schrieb Adrian Sevcenco: > Hi! After an initial bumpy bootstrapping (IMHO the defaults should be > whatever is already defined in .ssh of the user and custom values setup > with cli arguments) now i'm stuck adding any service/hosts/osds because > apparently i

[ceph-users] Re: Failure Domain = NVMe?

2021-03-11 Thread Dave Hall
Istvan, I agree that there is always risk with failure-domain < node, especially with EC pools. We are accepting this risk to lower the financial barrier to entry. In our minds, we have good power protection and new hardware, so the greatest immediate risks for our smaller cluster (approaching

[ceph-users] Ceph osd Reweight command in octopus

2021-03-11 Thread Brent Kennedy
We have a ceph octopus cluster running 15.2.6, its indicating a near full osd which I can see is not weighted equally with the rest of the osds. I tried to do the usual "ceph osd reweight osd.0 0.95" to force it down a little bit, but unlike the nautilus clusters, I see no data movement when

[ceph-users] Re: OSDs crashing after server reboot.

2021-03-11 Thread Igor Fedotov
Hi Cassiano, the backtrace you've provided relates to the bug fixed by: https://github.com/ceph/ceph/pull/37793 This fix is going to be releases with the upcoming v14.2.17. But I doubt that your original crashes have the same root cause - this issue appears during shutdown only. Anyway

[ceph-users] Re: Cephadm: Upgrade 15.2.5 -> 15.2.9 stops on non existing OSD

2021-03-11 Thread Sebastian Wagner
Hi Kai, looks like $ ssh pech-hd-009 # cephadm ls is returning this non-existent OSDs. can you verify that `cephadm ls` on that host doesn't print osd.355 ? Best, Sebastian Am 11.03.21 um 12:16 schrieb Kai Stian Olstad: > Before I started the upgrade the cluster was healthy but one >

[ceph-users] OSDs crashing after server reboot.

2021-03-11 Thread Cassiano Pilipavicius
Hi, please if someone know how to help, I have an HDD pool in mycluster and after rebooting one server, my osds has started to crash. This pool is a backup pool and have OSD as failure domain with an size of 2. After rebooting one server, My osds started to crash, and the thing is only getting

[ceph-users] Re: Cephadm: Upgrade 15.2.5 -> 15.2.9 stops on non existing OSD

2021-03-11 Thread Kai Stian Olstad
On 11.03.2021 15:47, Sebastian Wagner wrote: yes Am 11.03.21 um 15:46 schrieb Kai Stian Olstad: To resolve it, could I just remove it with "cephadm rm-daemon"? That worked like a charm, and the upgrade is resumed. Thank you Sebastian. -- Kai Stian Olstad

[ceph-users] Re: NVME pool creation time :: OSD services strange state - SOLVED

2021-03-11 Thread Adrian Sevcenco
On 3/11/21 5:01 PM, Adrian Sevcenco wrote: On 3/11/21 4:45 PM, Adrian Sevcenco wrote: Hi! So, after i selected the tags to add 2 nvme ssds i declared a replicated n=2 pool .. and for the last 30 min the progress shown in notification is 0% and iotop shows around 100K/s for 2 (???) ceph-mon

[ceph-users] Re: NVME pool creation time :: OSD services strange state

2021-03-11 Thread Adrian Sevcenco
On 3/11/21 4:45 PM, Adrian Sevcenco wrote: Hi! So, after i selected the tags to add 2 nvme ssds i declared a replicated n=2 pool .. and for the last 30 min the progress shown in notification is 0% and iotop shows around 100K/s for 2 (???) ceph-mon processes and that all ... and in my service

[ceph-users] Container deployment - Ceph-volume activation

2021-03-11 Thread Cloud Guy
Hello, TL;DR Looking for guidance on ceph-volume lvm activate --all as it would apply to a containerized ceph deployment (Nautilus or Octopus). Detail: I’m planning to upgrade my Nautilus non-container cluster to Octopus (eventually containerized). There’s an expanded procedure that was

[ceph-users] Re: Openstack rbd image Error deleting problem

2021-03-11 Thread Konstantin Shalygin
You can enable object-map feature online and rebuild it. This will be helping for deleting objects. k Sent from my iPhone > On 11 Mar 2021, at 04:05, Norman.Kern wrote: > > No, I use its default features like this: ___ ceph-users mailing list --

[ceph-users] Re: Cephadm: Upgrade 15.2.5 -> 15.2.9 stops on non existing OSD

2021-03-11 Thread Kai Stian Olstad
Hi Sebastian On 11.03.2021 13:13, Sebastian Wagner wrote: looks like $ ssh pech-hd-009 # cephadm ls is returning this non-existent OSDs. can you verify that `cephadm ls` on that host doesn't print osd.355 ? "cephadm ls" on the node does list this drive { "style": "cephadm:v1",

[ceph-users] Re: Alertmanager not using custom configuration template

2021-03-11 Thread Sebastian Wagner
Hi Mark, Indeed. I just merged https://github.com/ceph/ceph/pull/39932 which fixes the names of those config keys. Might want to try again (with slashes instead of underscores). Thanks for reporting this, Sebastian Am 10.03.21 um 15:34 schrieb Marc 'risson' Schmitt: > Hi, > > I'm trying to

[ceph-users] NVME pool creation time :: OSD services strange state

2021-03-11 Thread Adrian Sevcenco
Hi! So, after i selected the tags to add 2 nvme ssds i declared a replicated n=2 pool .. and for the last 30 min the progress shown in notification is 0% and iotop shows around 100K/s for 2 (???) ceph-mon processes and that all ... and in my service list the osd services look somehow empty:

[ceph-users] 3 x OSD work start after host reboot

2021-03-11 Thread Andrew Walker-Brown
Hi all, I’m just testing a new cluster and after shutting down one of the hosts, when I bring it back up none of the OSD’s will restart. The services fail to start of the osd’s and systemctl status for the service states “failed with result ‘exit-code’” Where to start looking for the root

[ceph-users] Re: cephadm (curl master)/15.2.9:: how to add orchestration

2021-03-11 Thread Adrian Sevcenco
On 3/11/21 3:07 PM, Sebastian Wagner wrote: Hi Adrian, Hi! Am 11.03.21 um 13:55 schrieb Adrian Sevcenco: Hi! After an initial bumpy bootstrapping (IMHO the defaults should be whatever is already defined in .ssh of the user and custom values setup with cli arguments) now i'm stuck adding any

[ceph-users] Re: Alertmanager not using custom configuration template

2021-03-11 Thread Marc 'risson' Schmitt
Hi, On Thu, 11 Mar 2021 11:47:44 +0100 Sebastian Wagner wrote: > Indeed. I just merged https://github.com/ceph/ceph/pull/39932 > which fixes the names of those config keys. > > Might want to try again (with slashes instead of underscores). This was indeed the problem. Thanks for your fix!

[ceph-users] how to tell balancer to balance

2021-03-11 Thread Boris Behrens
Hi, I know this topic seems to be handled a lot (as far as I can see), but I reached the end of my google_foo. * We have OSDs that are near full, but there are also OSDs that are only loaded with 50%. * We have 4,8,16 TB rotating disks in the cluster. * The disks that get packed are 4TB disks and

[ceph-users] Has anyone contact Data for Samsung Datacenter SSD Support ?

2021-03-11 Thread Christoph Adomeit
Hi, I hope someone here can help me out with some contact data, email-adress or phone Number for Samsung Datacenter SSD Support ? If I contact Standard Samsung Datacenter Support they tell me they are not there to support PM1735 Drives. We are planning a new Ceph-Cluster and we are thinking of

[ceph-users] Re: cephadm (curl master)/15.2.9:: how to add orchestration

2021-03-11 Thread Janne Johansson
Den tors 11 mars 2021 kl 13:56 skrev Adrian Sevcenco : > apparently i lack orchestration .. the the documentation show a big > "Page does not exist" > see > https://docs.ceph.com/en/latest/docs/octopus/mgr/orchestrator > Where does this link come from? Usually "latest" and an actual release name

[ceph-users] cephadm (curl master)/15.2.9:: how to add orchestration

2021-03-11 Thread Adrian Sevcenco
Hi! After an initial bumpy bootstrapping (IMHO the defaults should be whatever is already defined in .ssh of the user and custom values setup with cli arguments) now i'm stuck adding any service/hosts/osds because apparently i lack orchestration .. the the documentation show a big "Page does

[ceph-users] Re: A practical approach to efficiently store 100 billions small objects in Ceph

2021-03-11 Thread Szabo, Istvan (Agoda)
Yeah, makes sense and sounds a good idea :) I’ve never thought about this, will think in case of object store in our clusters. Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e:

[ceph-users] Re: how smart is ceph recovery?

2021-03-11 Thread Marc
> > > > 2. If a down host comes up again and it's osd are started. Is data > still being copied, or does ceph see that checksums(?) > > PG or RADOS object epoch I think. So if data hasn’t changed, the > recovery completes without having anything to do. > > > are the same and just sets a

[ceph-users] Re: A practical approach to efficiently store 100 billions small objects in Ceph

2021-03-11 Thread Szabo, Istvan (Agoda)
Hi, It relates to this sentence: "The median object size is ~4KB, written in RBD images using the default 4MB[0] object size. That will be ~100 millions RADOS objects instead of 100 billions." Istvan Szabo Senior Infrastructure Engineer --- Agoda

[ceph-users] Cephadm: Upgrade 15.2.5 -> 15.2.9 stops on non existing OSD

2021-03-11 Thread Kai Stian Olstad
Before I started the upgrade the cluster was healthy but one OSD(osd.355) was down, can't remember if it was in or out. Upgrade was started with ceph orch upgrade start --image goharbor.example.com/library/ceph/ceph:v15.2.9 The upgrade started but when Ceph tried to upgrade osd.355 it

[ceph-users] Re: mon db growing. over 500Gb

2021-03-11 Thread Andreas John
Hello, I also observed excessively growing mon DB in case of recovery. Luckily we were able to solve it by exdending the mon db disk. Without having the chance to re-check: The options nobackfill and norecover might cause that behavior.It feelds like mon holds data that cannot be flushed to an

[ceph-users] Re: Unpurgeable rbd image from trash

2021-03-11 Thread Enrico Bocchi
Thanks a lot, that fixed the problem. For the record, we are running nautilus 14.2.11 and there is no such '--input-file' option for setomapval. `#rados -p volumes setomapval rbd_trash id_5afa5e5a07b8bc < key_file` does the trick. Cheers, Enrico On 10/03/2021 17:17, Jason Dillaman wrote:

[ceph-users] Re: mon db growing. over 500Gb

2021-03-11 Thread Marc
>From what I have read here in the past, growing monitor db is related to not >having pg's in 'clean active' state > -Original Message- > From: ricardo.re.azev...@gmail.com > Sent: 11 March 2021 00:59 > To: ceph-users@ceph.io > Subject: [ceph-users] mon db growing. over 500Gb > > Hi

[ceph-users] Re: A practical approach to efficiently store 100 billions small objects in Ceph

2021-03-11 Thread Loïc Dachary
Thanks for clarifying, I think I understand. The idea is that 1,000 ~4KB objects are packed together in RBD which stores them in a single 4MB RADOS object. Does that answer your question? On 11/03/2021 08:22, Szabo, Istvan (Agoda) wrote: > Hi, > > It relates to this sentence: > "The median