[ceph-users] Re: Changes to Crush Weight Causing Degraded PGs instead of Remapped

2022-06-13 Thread Eugen Block
I remember someone reporting the same thing but I can’t find the thread right now. I’ll try again tomorrow. Zitat von Wesley Dillingham : I have a brand new Cluster 16.2.9 running bluestore with 0 client activity. I am modifying some crush weights to move PGs off of a host for testing

[ceph-users] Re: Ceph pool set min_write_recency_for_promote not working

2022-06-10 Thread Eugen Block
Hi, is your new pool configured as a cache-tier? The option you're trying to set is a cache-tier option [1]. Could the old pool have been a cache pool in the past so it still has this option set? [1] https://docs.ceph.com/en/latest/rados/operations/cache-tiering/#configuring-a-cache-tier

[ceph-users] Re: OSDs getting OOM-killed right after startup

2022-06-10 Thread Eugen Block
gets OOM-killed. So.. seems I can get my cluster running again, only limited by my internet upload now. Any hints why it eats a lot of memory in normal operation would still be appreciated. Best, Mara Am Wed, Jun 08, 2022 at 09:05:52AM + schrieb Eugen Block: It's even worse, you only give

[ceph-users] Re: RBD clone size check

2022-06-10 Thread Eugen Block
Hi, you can either use 'rbd du' command: control01:~ # rbd --id cinder du images/01b01349-a11c-489c-8349-4c5be9523c58 NAME PROVISIONED USED 01b01349-a11c-489c-8349-4c5be9523c58@snap2 GiB 2 GiB 01b01349-a11c-489c-8349-4c5be9523c58 2 GiB

[ceph-users] Re: Troubleshooting cephadm - not deploying any daemons

2022-06-09 Thread Eugen Block
1 7f7c1ef9fb80 DEBUG sestatus: Memory protection checking: actual (secure) 2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Max kernel policy version:  31 On 2022-06-08 4:30 PM, Eugen Block wrote: Have you checked /var/log/ceph/cephadm.log on the target nodes? Zitat von "Z

[ceph-users] Re: Troubleshooting cephadm - not deploying any daemons

2022-06-08 Thread Eugen Block
Have you checked /var/log/ceph/cephadm.log on the target nodes? Zitat von "Zach Heise (SSCC)" :  Yes, sorry - I tried both 'ceph orch apply mgr "ceph01,ceph03"' and 'ceph orch apply mds "ceph04,ceph05"' before writing this initial email - once again, the same logged message: "6/8/22 2:25:12

[ceph-users] Re: OSDs getting OOM-killed right after startup

2022-06-08 Thread Eugen Block
It's even worse, you only give them 1MB, not GB. Zitat von Eugen Block : Hi, is there any reason you use custom configs? Most of the defaults work well. But you only give your OSDs 1 GB of memory, that is way too low except for an idle cluster without much data. I recommend to remove

[ceph-users] Re: OSDs getting OOM-killed right after startup

2022-06-08 Thread Eugen Block
Hi, is there any reason you use custom configs? Most of the defaults work well. But you only give your OSDs 1 GB of memory, that is way too low except for an idle cluster without much data. I recommend to remove the line osd_memory_target = 1048576 and let ceph handle it. I didn't

[ceph-users] Re: Many errors about PG deviate more than 30% on a new cluster deployed by cephadm

2022-06-08 Thread Eugen Block
destroy the cluster and rebuild it - Mail original ----- De: "Eugen Block" À: "ceph-users" Envoyé: Mardi 7 Juin 2022 15:00:39 Objet: [ceph-users] Re: Many errors about PG deviate more than 30% on a new cluster deployed by cephadm Hi, please share the output of 'ceph o

[ceph-users] Re: rbd deep copy in Luminous

2022-06-08 Thread Eugen Block
Hi, the deep copy feature was introduced in Mimic [1] and I doubt that there will be backports since Luminous is EOL quite for some time now (as are Mimic and Nautilus btw). Eugen [1] https://ceph.io/en/news/blog/2018/v13-2-0-mimic-released/ Zitat von Pardhiv Karri : Hi, We are

[ceph-users] Re: Many errors about PG deviate more than 30% on a new cluster deployed by cephadm

2022-06-07 Thread Eugen Block
Hi, please share the output of 'ceph osd pool autoscale-status'. You have very low (too low) PG numbers per OSD (between 0 and 6), did you stop the autoscaler at an early stage? If you don't want to use the autoscaler you should increase the pg_num, but you could set autoscaler to warn

[ceph-users] Multi-active MDS cache pressure

2022-06-02 Thread Eugen Block
Hi, I'm currently debugging a reoccuring issue with multi-active MDS. The cluster is still on Nautilus and can't be upgraded at this time. There have been many discussions about "cache pressure" and I was able to find the right settings a couple of times, but before I change too much in

[ceph-users] Re: Degraded data redundancy and too many PGs per OSD

2022-06-01 Thread Eugen Block
Hi, how did you end up with that many PGs per OSD? According to your output the pg_autoscaler is enabled, if that was done by the autoscaler I would create a tracker issue for that. Then I would either disable it or set the mode to "warn" and then reduce the pg_num for some of the pools.

[ceph-users] Re: 2 pools - 513 pgs 100.00% pgs unknown - working cluster

2022-05-26 Thread Eugen Block
First thing I would try is a mgr failover. Zitat von Eneko Lacunza : Hi all, I'm trying to diagnose a issue in a tiny cluster that is showing the following status: root@proxmox3:~# ceph -s   cluster:     id: 80d78bb2-6be6-4dff-b41d-60d52e650016     health: HEALTH_WARN     1/3

[ceph-users] Re: cephadm error mgr not available and ERROR: Failed to add host

2022-05-25 Thread Eugen Block
Hi, first, you can bootstrap a cluster by providing the container image path in the bootstrap command like this: cephadm --image **:5000/ceph/ceph bootstrap --mon-ip ** Check out the docs for an isolated environment [1], I don't think it's a good idea to change the runtime the way you

[ceph-users] Re: Dashboard: SSL error in the Object gateway menu only

2022-05-22 Thread Eugen Block
Hi, in earlier versions (e.g. Nautilus) there was a dashboard command to set the RGW hostname, that is not available in Octopus (I didn’t check Pacific, probably when cephadm took over), so I would assume that it comes from the ‘ceph orch host add’ command and you probably used the host’s

[ceph-users] Re: prometheus retention

2022-05-20 Thread Eugen Block
Hi, I found this request [1] for version 18, it seems as if that’s not easily possible at the moment. [1] https://tracker.ceph.com/issues/54308 Zitat von Vladimir Brik : Hello Is it possible to increase to increase the retention period of the prometheus service deployed with cephadm?

[ceph-users] Re: Ceph RBD pool copy?

2022-05-19 Thread Eugen Block
Hi, I haven’t dealt with this for some time, it used to be a problem in earlier releases. But can’t you just change the ruleset of the glance pool to use the „better“ OSDs? Zitat von Pardhiv Karri : Hi, We have a ceph cluster with integration to Openstack. We are thinking about

[ceph-users] Re: MDS fails to start with error PurgeQueue.cc: 286: FAILED ceph_assert(readable)

2022-05-18 Thread Eugen Block
Hi, I don’t know what could cause that error, but could you share more details? You seem to have multiple active MDSs, is that correct? Could they be overloaded? What happened exactly, did one MDS fail or all of them? Do the standby MDS report anything different? Zitat von Kuko Armas :

[ceph-users] Re: Upgrade from v15.2.16 to v16.2.7 not starting

2022-05-18 Thread Eugen Block
Do you see anything suspicious in /var/log/ceph/cephadm.log? Also check the mgr logs for any hints. Zitat von Lo Re Giuseppe : Hi, We have happily tested the upgrade from v15.2.16 to v16.2.7 with cephadm on a test cluster made of 3 nodes and everything went smoothly. Today we started

[ceph-users] Re: ceph osd crush move exception

2022-05-11 Thread Eugen Block
. Zitat von zhengyi deng : Hi Eugen Block New node added " ceph osd crush add-bucket 192.168.1.47 host " . Executing "ceph osd crush move 192.168.1.47 root=default " caused ceph-mon to reboot. I solved the problem because there was a choose_args configuration in crushmap

[ceph-users] Re: ceph-crash user requirements

2022-05-10 Thread Eugen Block
Hi, there's a profile "crash" for that. In a lab setup with Nautilus thre's one crash client with these caps: admin:~ # ceph auth get client.crash [client.crash] key = caps mgr = "allow profile crash" caps mon = "allow profile crash" On a Octopus cluster

[ceph-users] Re: ceph osd crush move exception

2022-05-05 Thread Eugen Block
Hi, can you share your 'ceph osd tree' so it easier to understand what might be going wrong. I didn't check the script in detail, what exactly do you mean by extending? Do you create new hosts in a different root of the osd tree? Do those new hosts get PGs assigned although they're in a

[ceph-users] Re: Issues with new cephadm cluster

2022-05-04 Thread Eugen Block
Hi, the OSDs log into the journal, so you should be able to capture the logs during startup with 'journalctl -fu ceph-@osd..service' or check after the failure with 'journalctl -u ceph-@osd..service'. Zitat von 7ba335c6-fb20-4041-8c18-1b00efb78...@anonaddy.me: Hello, I've

[ceph-users] Re: Stretch cluster questions

2022-05-03 Thread Eugen Block
Hi, - Can we have multiple pools in a stretch cluster? yes, you can have multiple pools, but apparently they have to be all configured with the stretch rule as you already noted. - Can we have multiple different crush rules in a stretch cluster? It's still a regular ceph cluster, so

[ceph-users] Re: Ceph PGs stuck inactive after rebuild node

2022-05-03 Thread Eugen Block
Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com> --- On 2022. May 2., at 18:11, Eugen Block wrote: Email receive

[ceph-users] Re: Ceph PGs stuck inactive after rebuild node

2022-05-02 Thread Eugen Block
istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com> --- On 2022. May 2., at 15:59, Eugen Block wrote: Email received from the internet. If in doubt, don't click any link nor open any attachment ! Just to up

[ceph-users] Re: Ceph PGs stuck inactive after rebuild node

2022-05-02 Thread Eugen Block
ust speculation at the moment. As a workaround we'll increase osd_max_pg_per_osd_hard_ratio to 5 and see how the next attempt will go. Thanks, Eugen Zitat von Josh Baergen : On Wed, Apr 6, 2022 at 11:20 AM Eugen Block wrote: I'm pretty sure that their cluster isn't anywhere near the li

[ceph-users] Re: Reset dashboard (500 errors because of wrong config)

2022-04-27 Thread Eugen Block
Hi, you could try to set config keys, but I'm not sure if this will work. What do you get if you run this: ceph config-key get mgr/dashboard/_iscsi_config obtained 'mgr/dashboard/_iscsi_config' {"gateways": {"ses7-host1.fqdn": {"service_url": "http://:@:5000"}, "ses7-host2.fqdn":

[ceph-users] Re: cephadm export config

2022-04-23 Thread Eugen Block
Hi, ceph moved away from file based config to a config store within ceph. You only need a minimal ceph.conf to bootstrap a cluster or for clients which you can generate with: ceph config generate-minimal-conf You can dump the current ceph config and integrate into your ceph.conf: ceph

[ceph-users] Re: config/mgr/mgr/dashboard/GRAFANA_API_URL vs fqdn

2022-04-22 Thread Eugen Block
tat von grin : On Fri, 22 Apr 2022 06:54:33 +0000 Eugen Block wrote: Hi, > They are either static (so when the manager moves they become dead) > or dynamic (so they will be overwritten the moment the mgr moves), > aren't they? there might be a misunderstanding but the MGR failov

[ceph-users] Re: config/mgr/mgr/dashboard/GRAFANA_API_URL vs fqdn

2022-04-22 Thread Eugen Block
ana server for your cluster so a static URL works fine. I'm really not sure if we're on the same page here, if not please clarify. Zitat von grin : On Thu, 21 Apr 2022 08:52:48 +0000 Eugen Block wrote: there are a bunch of dashboard settings, for example pacific:~ # ceph dashboard set-g

[ceph-users] Re: config/mgr/mgr/dashboard/GRAFANA_API_URL vs fqdn

2022-04-21 Thread Eugen Block
Hi, there are a bunch of dashboard settings, for example pacific:~ # ceph dashboard set-grafana-api-url pacific:~ # ceph dashboard set-prometheus-api-host pacific:~ # ceph dashboard set-alertmanager-api-host and many more. Zitat von cephl...@drop.grin.hu: Hello, I have tried to find the

[ceph-users] Re: replaced osd's get systemd errors

2022-04-21 Thread Eugen Block
These are probably remainders of previous OSDs, I remember having to clean up orphaned units from time to time. Compare the UUIDs to your actual OSDs and disable the units of the non-existing OSDs. Zitat von Marc : I added some osd's which are up and running with: ceph-volume lvm create

[ceph-users] Re: Reinstalling OSD node managed by cephadm

2022-04-20 Thread Eugen Block
d: /usr/bin/chown -R ceph:ceph /dev/dm-4 /bin/docker: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-16 /bin/docker: --> ceph-volume lvm activate successful for osd ID: 16 On Wed, Apr 20, 2022 at 2:28 PM Eugen Block wrote: IIUC it's just the arrow that can't be displayed w

[ceph-users] Re: Reinstalling OSD node managed by cephadm

2022-04-20 Thread Eugen Block
trypoint /usr/sbin/ceph-volume --privileged --group-add=disk --init -e CONTAINER_IMAGE= quay.io/ceph/ceph@sha256:0d927ccbd8892180ee09894c2b2c26d07c938bf96a56eaee9b80fc9f26083ddb -e NODE_NAME=dmz-host-4 -e CEPH_USE_RANDOM_NONCE=1 -v /var/run/ceph/d221bc3c-8ff4-11ec-b4ba-b02628267680:/var/run/ceph:z

[ceph-users] Re: Reinstalling OSD node managed by cephadm

2022-04-20 Thread Eugen Block
] Running command: /usr/sbin/pvs --noheadings --readonly --separator=";" -S lv_uuid=pfWtmF-6Xlc-R2LO-kzeV-2jIw-3Ki8-gCOMwZ -o pv_name,pv_tags,pv_uuid,vg_name,lv_uuid [2022-04-20 10:38:02,301][ceph_volume.process][INFO ] stdout /dev/sdf";"";"j2Ilk4-12ZW-qR9u-3n5Y-gn6B

[ceph-users] Re: Reinstalling OSD node managed by cephadm

2022-04-20 Thread Eugen Block
Hi, have you checked /var/log/ceph/cephadm.log for any hints? ceph-volume.log may also provide some information (/var/log/ceph//ceph-volume.log) what might be going on. Zitat von Manuel Holtgrewe : Dear all, I now attempted this and my host is back in the cluster but the `ceph cephadm

[ceph-users] Re: Cephfs scalability question

2022-04-20 Thread Eugen Block
Hi, Is it advisable to limit the sizes of data pools or metadata pools of a cephfs filesystem for performance or other reasons? I assume you don't mean quotas for pools, right? The pool size is limited by the number and size of the OSDs, of course. I can't really say what's advisable or

[ceph-users] Re: Ceph RGW Multisite Multi Zonegroup Build Problems

2022-04-19 Thread Eugen Block
Hi, unless there are copy/paste mistakes involved I believe you shouldn't specify '--master' for the secondary zone because you did that already for the first zone which is supposed to be the master zone. You specified '--rgw-zone=us-west-1' as the master zone within your realm, but then

[ceph-users] Re: Ceph PGs stuck inactive after rebuild node

2022-04-11 Thread Eugen Block
Thanks, I’ll take a closer look at that. Zitat von Josh Baergen : Hi Eugen, how did you determine how many PGs were assigned to the OSDs? I looked at one of the OSD's logs and checked how many times each PG chunk of the affected pool was logged during startup. I got around 580 unique

[ceph-users] Re: Ceph PGs stuck inactive after rebuild node

2022-04-11 Thread Eugen Block
Hi, It could be, yes. I've seen a case on a test cluster where thousands of PGs were assigned to a single OSD even when the steady state was far fewer than that. how did you determine how many PGs were assigned to the OSDs? I looked at one of the OSD's logs and checked how many times each

[ceph-users] Re: Ceph PGs stuck inactive after rebuild node

2022-04-08 Thread Eugen Block
p": "chooseleaf_indep", "num": 9, "type": "host" }, { "op": "emit" } ] },

[ceph-users] Re: Ceph status HEALT_WARN - pgs problems

2022-04-07 Thread Eugen Block
ot; "Prometheus/2.18.1" Apr 7 11:21:16 hvs001 bash[2670]: debug 2022-04-07T11:21:16.267+ 7f514f9b2700 0 [prometheus INFO cherrypy.access.139987709758544] :::10.3.1.23 - - [07/Apr/2022:11:21:16] "GET /metrics HTTP/1.1" 200 166748 "" "Prometheus/2.18.1" ___

[ceph-users] Re: Ceph status HEALT_WARN - pgs problems

2022-04-07 Thread Eugen Block
Hi, please add some more output, e.g. ceph -s ceph osd tree ceph osd pool ls detail ceph osd crush rule dump (of the used rulesets) You have the pg_autoscaler enabled, you don't need to deal with pg_num manually. Zitat von Dominique Ramaekers : Hi, My cluster is up and running. I saw a

[ceph-users] Re: Ceph PGs stuck inactive after rebuild node

2022-04-07 Thread Eugen Block
tat von Josh Baergen : On Wed, Apr 6, 2022 at 11:20 AM Eugen Block wrote: I'm pretty sure that their cluster isn't anywhere near the limit for mon_max_pg_per_osd, they currently have around 100 PGs per OSD and the configs have not been touched, it's pretty basic. How is the host being

[ceph-users] Re: Ceph PGs stuck inactive after rebuild node

2022-04-06 Thread Eugen Block
Thanks for the comments, I'll get the log files to see if there's any hint. Getting the PGs in an active state is one thing, I'm sure multiple approaches would have worked. The main question is why this happens, we have 19 hosts to rebuild and can't risk the application outage everytime.

[ceph-users] Re: Ceph PGs stuck inactive after rebuild node

2022-04-06 Thread Eugen Block
ine. Zitat von Zakhar Kirpichenko : Hi Eugen, Can you please elaborate on what you mean by "restarting the primary PG"? Best regards, Zakhar On Wed, Apr 6, 2022 at 5:15 PM Eugen Block wrote: Update: Restarting the primary PG helped to bring the PGs back to active state. Consider this

[ceph-users] Re: Ceph PGs stuck inactive after rebuild node

2022-04-06 Thread Eugen Block
Update: Restarting the primary PG helped to bring the PGs back to active state. Consider this thread closed. Zitat von Eugen Block : Hi all, I have a strange situation here, a Nautilus cluster with two DCs, the main pool is an EC pool with k7 m11, min_size = 8 (failure domain host). We

[ceph-users] Ceph PGs stuck inactive after rebuild node

2022-04-06 Thread Eugen Block
Hi all, I have a strange situation here, a Nautilus cluster with two DCs, the main pool is an EC pool with k7 m11, min_size = 8 (failure domain host). We confirmed failure resiliency multiple times for this cluster, today we rebuilt one node resulting in currently 34 inactive PGs. I'm

[ceph-users] Re: mons on osd nodes with replication

2022-04-06 Thread Eugen Block
Hi Ali, it's very common to have MONs and OSDs colocated on the same host. Zitat von Ali Akil : Hallo together, i am planning a Ceph cluster on 3 storage nodes (12 OSDs per Cluster with Bluestorage). Each node has 192 GB of memory nad 24 cores of cpu. I know it's recommended to have

[ceph-users] Re: Ceph remote disaster recovery at PB scale

2022-04-06 Thread Eugen Block
Thanks for the clarification, I get it now. This would be quite helpful to have in the docs, I believe. ;-) Zitat von Arthur Outhenin-Chalandre : Hi Eugen, On 4/6/22 09:47, Eugen Block wrote: I don't mean to hijack this thread, I'm just curious about the multiple mirror daemons statement

[ceph-users] Re: RuntimeError on activate lvm

2022-04-06 Thread Eugen Block
Hi, is there any specific reason why you do it manually instead of letting cephadm handle it? I might misremember but I believe for the manual lvm activation to work you need to pass the '--no-systemd' flag. Regards, Eugen Zitat von Dominique Ramaekers : Hi, I've setup a ceph cluster

[ceph-users] Re: Ceph remote disaster recovery at PB scale

2022-04-06 Thread Eugen Block
Hi, I don't mean to hijack this thread, I'm just curious about the multiple mirror daemons statement. Last year you mentioned that multiple daemons only make sense if you have different pools to mirror [1], at leat that's how I read it, you wrote: [...] but actually you can have multiple

[ceph-users] Re: can't deploy osd/db on nvme with other db logical volume

2022-04-04 Thread Eugen Block
ssh the host and execute the command for each OSD. if we have to add many OSD, it will take lots of time. On Mon, Apr 4, 2022 at 3:42 PM Eugen Block wrote: Hi, this is handled by ceph-volume, do you find anything helpful in /var/log/ceph//ceph-volume.log? Also check the cephadm.log for any

[ceph-users] Re: can't deploy osd/db on nvme with other db logical volume

2022-04-04 Thread Eugen Block
Hi, this is handled by ceph-volume, do you find anything helpful in /var/log/ceph//ceph-volume.log? Also check the cephadm.log for any hints. Zitat von 彭勇 : we have a running ceph, 16.2.7, with SATA OSD and DB on nvme. and we insert some SATA to host, and the status of new host is

[ceph-users] Re: Ceph rbd mirror journal pool

2022-04-04 Thread Eugen Block
Hi samuel, I haven't used dedicated rbd journal pools so I don't have any comment on that. But there's an alternative to journal-based mirroring, you can also mirror based on snapshot [1]. Would this be an alternative for you to look deeper into? Regards, Eugen [1]

[ceph-users] Re: zap an osd and it appears again

2022-03-31 Thread Eugen Block
Zitat von Alfredo Rezinovsky : Yes. osd.all-available-devices 0 - 3h osd.dashboard-admin-1635797884745 7 4m ago 4M * How should I disable the creation? El mié, 30 mar 2022 a las 17:24, Eugen Block () escribió: Do you have other

[ceph-users] Re: zap an osd and it appears again

2022-03-30 Thread Eugen Block
Do you have other osd services defined which would apply to the affected host? Check ‚ceph orch ls‘ for other osd services. Zitat von Alfredo Rezinovsky : I want to create osds manually If I zap the osd 0 with: ceph orch osd rm 0 --zap as soon as the dev is available the orchestrator

[ceph-users] Re: Fighting with cephadm; inconsistent maintenance mode, forever starting daemons

2022-03-29 Thread Eugen Block
Hi, I would recommend to focus on one issue at a time and try to resolve it first. It is indeed very much to read and not really clear which issues could be connected. Can you start with your current cluster status (ceph -s) and some basic outputs like 'ceph orch ls', 'ceph orch ps'

[ceph-users] Re: ceph mon failing to start

2022-03-28 Thread Eugen Block
Hi, does the failed MON's keyring file contain the correct auth caps? Then I would also remove the local (failed) MON's store.db before rejoining. Zitat von Tomáš Hodek : Hi, I have 3 node ceph cluster (managed via proxmox). Got single node fatal failure and replaced it. Os boots

[ceph-users] Re: Changing PG size of cache pool

2022-03-28 Thread Eugen Block
not recommended. Zitat von Daniel Persson : Hi Eugen. I've tried. The system says it's not recommended but I may force it. Forcing something with the risk of losing data is not something I'm going to do. Best regards Daniel On Sat, Mar 26, 2022 at 8:55 PM Eugen Block wrote: Hi, just because

[ceph-users] Re: Changing PG size of cache pool

2022-03-26 Thread Eugen Block
Hi, just because the autoscaler doesn’t increase the pg_num doesn’t mean you can’t increase it manually. Have you tried that? Zitat von Daniel Persson : Hi Team. We are currently in the process of changing the size of our cache pool. Currently it's set to 32 PGs and distributed weirdly on

[ceph-users] Re: ceph namespace access control

2022-03-25 Thread Eugen Block
ter option. Thanks! Zitat von Ilya Dryomov : On Fri, Mar 25, 2022 at 10:11 AM Eugen Block wrote: Hi, I was curious and tried the same with debug logs. One thing I noticed was that if I use the '-k ' option I get a different error message than with '--id user3'. So with '-k' the result is the same

[ceph-users] Re: [ERR] OSD_FULL: 1 full osd(s) - with 73% used

2022-03-25 Thread Eugen Block
ened again. It is like the ceph thinks the osd is still full, as it was before... Em qua., 23 de mar. de 2022 às 14:38, Eugen Block escreveu: Without having an answer to the question why the OSD is full I'm wondering why the OSD has a crush weight of 1.2 while its size is only 1 TB. Was th

[ceph-users] Re: ceph namespace access control

2022-03-25 Thread Eugen Block
Hi, I was curious and tried the same with debug logs. One thing I noticed was that if I use the '-k ' option I get a different error message than with '--id user3'. So with '-k' the result is the same: ---snip--- pacific:~ # rbd -k /etc/ceph/ceph.client.user3.keyring -p test2 --namespace

[ceph-users] Re: [ERR] OSD_FULL: 1 full osd(s) - with 73% used

2022-03-23 Thread Eugen Block
Without having an answer to the question why the OSD is full I'm wondering why the OSD has a crush weight of 1.2 while its size is only 1 TB. Was that changed on purpose? I'm not sure if that would explain the OSD full message, though. Zitat von Rodrigo Werle : Hi everyone! I'm

[ceph-users] Re: Pacific : ceph -s Data: Volumes: 1/1 healthy

2022-03-22 Thread Eugen Block
How about this one? https://docs.ceph.com/en/latest/cephfs/fs-volumes/ Zitat von Rafael Diaz Maurin : Hi cephers, Under Pacific, I just noticed a new info when running a 'ceph -s': [...]   date:     volume: 1/1 healthy [...] I can't find the info in the Ceph docs, does anyone know what

[ceph-users] Re: orch apply failed to use insecure private registry

2022-03-21 Thread Eugen Block
Hi, Setting mgr/cephadm/registry_insecure to false doesn't help. if you want to use an insecure registry you would need to set this option to true, not false. I am using podman and /etc/containers/registries.conf is set with that insecure private registry. Can you paste the whole

[ceph-users] Re: Migrating OSDs to dockerized ceph

2022-03-11 Thread Eugen Block
clear. Thanks again, Zach On 3/12/22 00:07, Eugen Block wrote: Hi, are you also planning to switch to cephadm? In that case you could just adopt all the daemons [1], I believe docker would also work (I use it with podman). [1] https://docs.ceph.com/en/pacific/cephadm/adoption.html

[ceph-users] Re: "Incomplete" pg's

2022-03-07 Thread Eugen Block
Hi, IIUC the OSDs 3,4,5 have been removed while some PGs still refer to them, correct? Have the OSDs been replaced with the same IDs? If not (so there are currently no OSDs with IDs 3,4,5 in your osd tree) maybe marking them as lost [1] would resolve the stuck PG creation, although I

[ceph-users] Re: Failed in ceph-osd -i ${osd_id} --mkfs -k /var/lib/ceph/osd/ceph-${osd_id}/keyring

2022-03-07 Thread Eugen Block
) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-1/block: (2) No such file or directory Can you clarify? Zitat von huxia...@horebdata.cn: thanks, Eugen. I am suspecting this perhaps could be related to https://tracker.ceph.com/issues/42223 huxia...@horebdata.cn From: Eugen

[ceph-users] Re: Failed in ceph-osd -i ${osd_id} --mkfs -k /var/lib/ceph/osd/ceph-${osd_id}/keyring

2022-03-05 Thread Eugen Block
Hi, there must be some mixup of the OSD IDs. Your command seems to use ID 3 but the log complains about ID 1. You should check your script and workflow. Zitat von huxia...@horebdata.cn: Dear Ceph folks, I encountered a strange behavior with Luminous 12.2.13, when running the following

[ceph-users] Re: Journal size recommendations

2022-03-01 Thread Eugen Block
Hi, can you be more specific what exactly you are looking for? Are you talking about the rocksDB size? And what is the unit for 5012? It’s really not clear to me what you’re asking. And since the recommendations vary between different use cases you might want to share more details about

[ceph-users] Re: WG: Multisite sync issue

2022-02-25 Thread Eugen Block
ts. So far no loadbalancer has been put into place there. Best, Julian -Ursprüngliche Nachricht- Von: Eugen Block Gesendet: Freitag, 25. Februar 2022 10:52 An: ceph-users@ceph.io Betreff: [ceph-users] Re: WG: Multisite sync issue This email originated from outside of CGM. Please do not c

[ceph-users] Re: WG: Multisite sync issue

2022-02-25 Thread Eugen Block
Hi, I would stop alle RGWs except one in each cluster to limit the places and logs to look at. Do you have a loadbalancer as endpoint or do you have a list of all RGWs as endpoints? Zitat von "Poß, Julian" : Hi, i did setup multisite with 2 ceph clusters and multiple rgw's and

[ceph-users] Re: OSD Container keeps restarting after drive crash

2022-02-24 Thread Eugen Block
Hi, these are the defaults set by cephadm in Octopus and Pacific: ---snip--- [Service] LimitNOFILE=1048576 LimitNPROC=1048576 EnvironmentFile=-/etc/environment ExecStart=/bin/bash {data_dir}/{fsid}/%i/unit.run ExecStop=-{container_path} stop ceph-{fsid}-%i ExecStopPost=-/bin/bash

[ceph-users] Re: ceph os filesystem in read only

2022-02-24 Thread Eugen Block
Hi, 1. How long will ceph continue to run before it starts complaining about this? Looks like it is fine for a few hours, ceph osd tree and ceph -s, seem not to notice anything. if the OSDs don't have to log anything to disk (which can take quite some time depending on the log settings)

[ceph-users] Re: OSD SLOW_OPS is filling MONs disk space

2022-02-23 Thread Eugen Block
. Le mer. 23 févr. 2022 à 11:41, Eugen Block a écrit : Hi, > How can I identify which operation this OSD is trying to achieve as > osd_op() is a bit large ^^ ? I would start by querying the OSD for historic_slow_ops: ceph daemon osd. dump_historic_slow_ops to see which operation it is. >

[ceph-users] Re: Unclear on metadata config for new Pacific cluster

2022-02-23 Thread Eugen Block
Hi, if you want to have DB and WAL on the same device, just don't specify WAL in your drivegroup. It will be automatically created on the DB device, too. In your case the rotational flag should be enough to distinguish between data and DB. based on the suggestion in the docs that this

[ceph-users] Re: OSD SLOW_OPS is filling MONs disk space

2022-02-23 Thread Eugen Block
Hi, How can I identify which operation this OSD is trying to achieve as osd_op() is a bit large ^^ ? I would start by querying the OSD for historic_slow_ops: ceph daemon osd. dump_historic_slow_ops to see which operation it is. How can I identify the related images to this data chunk?

[ceph-users] Re: Ceph EC K+M

2022-02-21 Thread Eugen Block
:istvan.sz...@agoda.com> --- On 2022. Feb 21., at 19:20, Eugen Block wrote: Email received from the internet. If in doubt, don't click any link nor open any attachment ! Hi, it really depends on the resiliency requirements and the use case. We

[ceph-users] Re: Ceph EC K+M

2022-02-21 Thread Eugen Block
Hi, it really depends on the resiliency requirements and the use case. We have a couple of customers with EC profiles like k=7 m=11. The potential waste of space as Anthony already mentions has to be considered, of course. But with regards to performance we haven't heard any complaints

[ceph-users] Re: Problem with Ceph daemons

2022-02-16 Thread Eugen Block
Can you retry after resetting the systemd unit? The message "Start request repeated too quickly." should be cleared first, then start it again: systemctl reset-failed ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service systemctl start

[ceph-users] Re: Need feedback on cache tiering

2022-02-16 Thread Eugen Block
level caching. Mark On 2/16/22 10:18, Eugen Block wrote: Hi, we've noticed the warnings for quite some time now, but we're big fans of the cache tier. :-) IIRC we set it up some time around 2015 or 2016 for our production openstack environment and it works nicely for us. We tried

[ceph-users] Re: Need feedback on cache tiering

2022-02-16 Thread Eugen Block
Hi, we've noticed the warnings for quite some time now, but we're big fans of the cache tier. :-) IIRC we set it up some time around 2015 or 2016 for our production openstack environment and it works nicely for us. We tried it without the cache some time after we switched to Nautilus but

[ceph-users] Re: cephadm: update fewer OSDs at a time?

2022-02-14 Thread Eugen Block
, 2022 at 11:21 AM Eugen Block wrote: It does update only one OSD at a time, I did that in my little test cluster on Octopus today. I haven’t played too much with Pacific yet, maybe some things have changed there? Zitat von Zakhar Kirpichenko : > Hi Eugen, > > Thanks for this. All of

[ceph-users] Re: cephadm: update fewer OSDs at a time?

2022-02-14 Thread Eugen Block
of 1 host at a time we could resolve this issue. /Z On Mon, Feb 14, 2022 at 4:26 PM Eugen Block wrote: Hi, what are your rulesets for the affected pools? As far as I remember the orchestrator updates one OSD node at a time, but not multiple OSDs at once, only one by one. It checks with the &qu

[ceph-users] Re: cephadm: update fewer OSDs at a time?

2022-02-14 Thread Eugen Block
Hi, what are your rulesets for the affected pools? As far as I remember the orchestrator updates one OSD node at a time, but not multiple OSDs at once, only one by one. It checks with the "ok-to-stop" command if an upgrade of that daemon can proceed, so as long as you have host as

[ceph-users] Re: RBD map issue

2022-02-14 Thread Eugen Block
5bbddf414a format: 2 features: layering, exclusive-lock, data-pool op_features: flags: create_timestamp: Thu Feb 10 18:17:42 2022 access_timestamp: Thu Feb 10 18:17:42 2022 modify_timestamp: Thu Feb 10 18:17:42 2022 Giuseppe On 11.02.22, 14:52, &q

[ceph-users] Re: osds won't start

2022-02-14 Thread Eugen Block
uot;, "bfm_blocks_per_key": "128", "bfm_bytes_per_block": "4096", "bfm_size": "6001171365888", "bluefs": "1", "ceph_fsid": "1234abcd-1234-abcd-1234-1234 abcd1234",

[ceph-users] Re: RBD map issue

2022-02-11 Thread Eugen Block
-dcache-data, profile rbd pool=fulen-dcache-meta, profile rbd pool=fulen-hdd-data, profile rbd pool=fulen-nvme-meta" On 11.02.22, 13:22, "Eugen Block" wrote: Hi, the first thing coming to mind are the user's caps. Which permissions do they have? Have you compa

[ceph-users] Re: RBD map issue

2022-02-11 Thread Eugen Block
Hi, the first thing coming to mind are the user's caps. Which permissions do they have? Have you compared 'ceph auth get client.fulen' on both clusters? Please paste the output from both clusters and redact sensitive information. Zitat von Lo Re Giuseppe : Hi all, This is my first

[ceph-users] Re: slow pacific osd startup

2022-02-11 Thread Eugen Block
Hi, is there a difference in PG size on new and old OSDs or are they all similar in size? Is there some fsck enabled during OSD startup? Zitat von Andrej Filipcic : Hi, with 16.2.7, some OSDs are very slow to start, eg it takes ~30min for an hdd (12TB, 5TB used) to become active. After

[ceph-users] Re: osds won't start

2022-02-11 Thread Eugen Block
Can you share some more information how exactly you upgraded? It looks like a cephadm managed cluster. Did you intall OS updates on all nodes without waiting for the first one to recover? Maybe I'm misreading so please clarify what your update process looked like. Zitat von Mazzystr : I

[ceph-users] Re: Changing prometheus default alerts with cephadm

2022-02-04 Thread Eugen Block
Hi, you should be able to change in the config file: /var/lib/ceph//prometheus.ses7-host1/etc/prometheus/alerting/ceph_alerts.yml and restart the containers. Regards, Eugen Zitat von Manuel Holtgrewe : Dear all, I wonder how I can adjust the default alerts generated by prometheus when

[ceph-users] Re: How to remove stuck daemon?

2022-01-26 Thread Eugen Block
Hi, have you tried to failover the mgr service? I noticed similar behaviour in Octopus. Zitat von Fyodor Ustinov : Hi! No one knows how to fix it? - Original Message - From: "Fyodor Ustinov" To: "ceph-users" Sent: Tuesday, 25 January, 2022 11:29:53 Subject: [ceph-users] How

[ceph-users] Re: ceph-mon is low on available space

2022-01-21 Thread Eugen Block
Hi, this is a disk space warning. If the MONs get below 30% free disk space you'll get a warning since a MON store can grow in case of recovery for a longer period of time. Use 'df -h' and you'll probably see /var/lib/containers/ with less than 30% free space. You can either decrease

[ceph-users] Re: ceph orch osd daemons "stopped"

2022-01-07 Thread Eugen Block
Have you also tried this? # ceph orch daemon restart osd.12 Without the "daemon" you would try to restart an entire service called "osd.12" which obviously doesn't exist. With "daemon" you can restart specific daemons. Zitat von Manuel Holtgrewe : Dear all, I'm running Pacific 16.2.7

[ceph-users] Re: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools (bug 53663)

2021-12-20 Thread Eugen Block
Hi, you wrote that this cluster was initially installed with Octopus, so no upgrade ceph wise? Are all RGW daemons on the exact same ceph (minor) versions? I remember one of our customers reporting inconsistent objects on a regular basis although no hardware issues were detectable. They

<    4   5   6   7   8   9   10   11   12   13   >