[ceph-users] Re: Ceph PGs stuck inactive after rebuild node

2022-04-06 Thread Eugen Block
Thanks for the comments, I'll get the log files to see if there's any hint. Getting the PGs in an active state is one thing, I'm sure multiple approaches would have worked. The main question is why this happens, we have 19 hosts to rebuild and can't risk the application outage everytime.

[ceph-users] Re: Ceph PGs stuck inactive after rebuild node

2022-04-06 Thread Anthony D'Atri
Something worth a try before restarting an OSD in situations like this: ceph osd down 9 This marks the OSD down in the osdmap, but doesn’t touch the daemon. Typically the subject OSD will see this and tell the mons “I’m not dead yet!” and repeer, which sometimes suffices to clear

[ceph-users] Re: Ceph PGs stuck inactive after rebuild node

2022-04-06 Thread Zakhar Kirpichenko
Thanks everyone! /Zakhar On Wed, Apr 6, 2022 at 6:24 PM Josh Baergen wrote: > For future reference, "ceph pg repeer " might have helped here. > > Was the PG stuck in the "activating" state? If so, I wonder if you > temporarily exceeded mon_max_pg_per_osd on some OSDs when rebuilding > your

[ceph-users] Re: Ceph PGs stuck inactive after rebuild node

2022-04-06 Thread Eugen Block
Sure, from the output of 'ceph pg map ' you get the acting set, for example: cephadmin:~ # ceph pg map 32.18 osdmap e7198 pg 32.18 (32.18) -> up [9,2,1] acting [9,2,1] Then I restarted OSD.9 and the inactive PG became active again. I remember this has been discussed a couple of times in the

[ceph-users] Re: Ceph PGs stuck inactive after rebuild node

2022-04-06 Thread Zakhar Kirpichenko
Hi Eugen, Can you please elaborate on what you mean by "restarting the primary PG"? Best regards, Zakhar On Wed, Apr 6, 2022 at 5:15 PM Eugen Block wrote: > Update: Restarting the primary PG helped to bring the PGs back to > active state. Consider this thread closed. > > > Zitat von Eugen

[ceph-users] Re: Ceph PGs stuck inactive after rebuild node

2022-04-06 Thread Eugen Block
Update: Restarting the primary PG helped to bring the PGs back to active state. Consider this thread closed. Zitat von Eugen Block : Hi all, I have a strange situation here, a Nautilus cluster with two DCs, the main pool is an EC pool with k7 m11, min_size = 8 (failure domain host). We

[ceph-users] Ceph PGs stuck inactive after rebuild node

2022-04-06 Thread Eugen Block
Hi all, I have a strange situation here, a Nautilus cluster with two DCs, the main pool is an EC pool with k7 m11, min_size = 8 (failure domain host). We confirmed failure resiliency multiple times for this cluster, today we rebuilt one node resulting in currently 34 inactive PGs. I'm

[ceph-users] Re: Quincy: mClock config propagation does not work properly

2022-04-06 Thread Sridhar Seshasayee
Hi Luis, While I work on the fix, I thought a workaround could be useful. I am adding it here and also in the tracker I mentioned in my previous update. -- *Workaround* Until a fix is available, the following workaround may be used to override the parameters: 1. Run the injectargs command as

[ceph-users] Ceph status HEALT_WARN - pgs problems

2022-04-06 Thread Dominique Ramaekers
Hi, My cluster is up and running. I saw a note in ceph status that 1 pg was undersized. I read about the amount of pgs and the recommended value (OSD's*100/poolsize => 6*100/3 = 200). The pg_num should be raised carfully, so I raised it to 2 and ceph status was fine again. So I left it like it

[ceph-users] Re: latest octopus radosgw missing cors header

2022-04-06 Thread Boris Behrens
Ok, apparently I was "holding it wrong". (kudos to dwfreed from IRC for helping) When I send the Origin Header correct, I will also receive the correct CORS header: root@rgw-1:~# curl -s -D - -o /dev/null --header "Host: kervyn.de" -H "Origin: https://example.com; 'http://

[ceph-users] Re: mons on osd nodes with replication

2022-04-06 Thread Eugen Block
Hi Ali, it's very common to have MONs and OSDs colocated on the same host. Zitat von Ali Akil : Hallo together, i am planning a Ceph cluster on 3 storage nodes (12 OSDs per Cluster with Bluestorage). Each node has 192 GB of memory nad 24 cores of cpu. I know it's recommended to have

[ceph-users] mons on osd nodes with replication

2022-04-06 Thread Ali Akil
Hallo together, i am planning a Ceph cluster on 3 storage nodes (12 OSDs per Cluster with Bluestorage). Each node has 192 GB of memory nad 24 cores of cpu. I know it's recommended to have separated MON and ODS hosts, in order to minimize disruption since monitor and OSD daemons are not inactive

[ceph-users] Re: Ceph remote disaster recovery at PB scale

2022-04-06 Thread Eugen Block
Thanks for the clarification, I get it now. This would be quite helpful to have in the docs, I believe. ;-) Zitat von Arthur Outhenin-Chalandre : Hi Eugen, On 4/6/22 09:47, Eugen Block wrote: I don't mean to hijack this thread, I'm just curious about the multiple mirror daemons statement.

[ceph-users] Re: RuntimeError on activate lvm

2022-04-06 Thread Dominique Ramaekers
Thanks Janne ofr the tip! I removed the keys with 'ceph auth rm'. The lvm's are added now automatically! > -Oorspronkelijk bericht- > Van: Janne Johansson > Verzonden: woensdag 6 april 2022 10:48 > Aan: Dominique Ramaekers > CC: Eugen Block ; ceph-users@ceph.io > Onderwerp: Re:

[ceph-users] latest octopus radosgw missing cors header

2022-04-06 Thread Boris Behrens
Hi, I just try to get CORS header to work (we've set them always on a front facing HAproxy, but a customer wants their own). I've set CORS policy via aws cli (please don't mind the test header): $ cat cors.json {"CORSRules": [{"AllowedOrigins": ["https://example.com"],"AllowedHeaders":

[ceph-users] Re: RuntimeError on activate lvm

2022-04-06 Thread Janne Johansson
Den ons 6 apr. 2022 kl 10:44 skrev Dominique Ramaekers : > > Additionaly, if I try to add the volume automatically (I zapped the lvm and > removed de osd entries with ceph osd rm, then recreated the lvm's). Now I get > this... > Command: 'ceph orch daemon add osd hvs001:/dev/hvs001_sda2/lvol0'

[ceph-users] Re: RuntimeError on activate lvm

2022-04-06 Thread Dominique Ramaekers
Additionaly, if I try to add the volume automatically (I zapped the lvm and removed de osd entries with ceph osd rm, then recreated the lvm's). Now I get this... Command: 'ceph orch daemon add osd hvs001:/dev/hvs001_sda2/lvol0' Errors: RuntimeError: cephadm exited with an error code: 1,

[ceph-users] Re: Ceph remote disaster recovery at PB scale

2022-04-06 Thread Arthur Outhenin-Chalandre
Hi Eugen, On 4/6/22 09:47, Eugen Block wrote: > I don't mean to hijack this thread, I'm just curious about the > multiple mirror daemons statement. Last year you mentioned that > multiple daemons only make sense if you have different pools to mirror > [1], at leat that's how I read it, you

[ceph-users] Re: RuntimeError on activate lvm

2022-04-06 Thread Dominique Ramaekers
Hi Eugen, Thanks for the quick response! I'm probably doing things the more difficult (wrong) way  This is my first installation of a Ceph-cluster. I'm setting op three servers for non-critical data and low i/o-load. I don't want to lose capacity in storage space by losing the entire disk on

[ceph-users] Re: RuntimeError on activate lvm

2022-04-06 Thread Eugen Block
Hi, is there any specific reason why you do it manually instead of letting cephadm handle it? I might misremember but I believe for the manual lvm activation to work you need to pass the '--no-systemd' flag. Regards, Eugen Zitat von Dominique Ramaekers : Hi, I've setup a ceph cluster

[ceph-users] Re: Ceph remote disaster recovery at PB scale

2022-04-06 Thread Eugen Block
Hi, I don't mean to hijack this thread, I'm just curious about the multiple mirror daemons statement. Last year you mentioned that multiple daemons only make sense if you have different pools to mirror [1], at leat that's how I read it, you wrote: [...] but actually you can have multiple

[ceph-users] Re: Ceph remote disaster recovery at PB scale

2022-04-06 Thread Arthur Outhenin-Chalandre
Hi, On 4/1/22 10:56, huxia...@horebdata.cn wrote: > 1) Rbd mirroring with Peta bytes data is doable or not? are there any > practical limits on the size of the total data? So the first thing that matter with rbd replication is the amount of data you write if you have a PB that mostly don't