[ceph-users] Re: 1 MDS report slow metadata IOs

2021-10-06 Thread Eugen Block
it help to just one mds and one monitor ? thanks! On Tue, Oct 5, 2021 at 1:42 PM Eugen Block wrote: All your PGs are inactive, if two of four OSDs are down and you probably have a pool size of 3 then no IO can be served. You’d need at least three up ODSs to resolve that. Zitat von Abdelillah

[ceph-users] Re: 1 MDS report slow metadata IOs

2021-10-05 Thread Eugen Block
All your PGs are inactive, if two of four OSDs are down and you probably have a pool size of 3 then no IO can be served. You’d need at least three up ODSs to resolve that. Zitat von Abdelillah Asraoui : Ceph is reporting warning on slow metdataIOs on one of the MDS server, this is a new

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-05 Thread Eugen Block
ugh memory. 发件人: Szabo, Istvan (Agoda)<mailto:istvan.sz...@agoda.com> 发送时间: 2021年10月4日 0:46 收件人: Igor Fedotov<mailto:ifedo...@suse.de> 抄送: ceph-users@ceph.io<mailto:ceph-users@ceph.io> 主题: [ceph-users] Re: is it possible to remove the db+wal from an external device (nvme) Seem

[ceph-users] Re: osd marked down

2021-10-04 Thread Eugen Block
7f8633cc1f00 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-3/keyring: (13) Permission denied debug 2021-10-04T16:06:38.288+ 7f8633cc1f00 -1 monclient: keyring not found failed to fetch mon config (--no-mon-config to skip) thanks! On Fri, Oct 1, 2021 at 2:02 AM Eugen Block wrote: I'm

[ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input

2021-10-01 Thread Eugen Block
I can't access the pastebin, did you verify if you hit the same issue as Stefan referenced (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/KQ5A5OWRIUEOJBC7VILBGDIKPQGJQIWN/)? Before deleting or rebuilding anything I would first check what the root cause is. As Stefan said,

[ceph-users] Re: Failing to mount PVCs

2021-10-01 Thread Eugen Block
Hi, I'm not entirely sure if this really is the same issue here. One of our customers also works with k8s in openstack and I saw similar messages. We never investigated it, I don't know if the customer did, but one thing they encountered was that k8s didn't properly clean up

[ceph-users] Re: Rbd mirror

2021-10-01 Thread Eugen Block
Hi, I don't know for sure but I believe you can have only one rbd mirror daemon per cluster. So you can either configure one-way or two-way mirroring between two clusters. With your example the third cluster would then require two mirror daemons which is not possible AFAIK. I can't tell

[ceph-users] Re: dealing with unfound pg in 4:2 ec pool

2021-10-01 Thread Eugen Block
Hi, I'm not sure if setting min_size to 4 would also fix the PGs, but the client IO would probably be restored. Marking it as lost is the last straw according to this list, luckily I haven't been in such a situation yet. So give it a try with min_size = 4 but don't forget to increase

[ceph-users] Re: osd marked down

2021-10-01 Thread Eugen Block
n Thu, Sep 30, 2021 at 1:18 AM Eugen Block wrote: Is the content of OSD.3 still available in the filesystem? If the answer is yes you can get the OSD's keyring from /var/lib/ceph/osd/ceph-3/keyring Then update your osd.3.export file with the correct keyring and then import the correct back t

[ceph-users] Re: New Ceph cluster in PRODUCTION

2021-09-30 Thread Eugen Block
Hi, there is no information about your ceph cluster, e. g. hdd/ssd/nvme disks. This information can be crucial with regards to performance. Also why would you use osd_pool_default_min_size = 1 osd_pool_default_size = 2 There have been endless discussions in this list why a pool size of

[ceph-users] Re: osd marked down

2021-09-30 Thread Eugen Block
must have imported osd.2 key instead, now osd.3 has the same key as osd.2 ceph auth import -i osd.3.export How do we update this ? thanks! On Wed, Sep 29, 2021 at 2:13 AM Eugen Block wrote: Just to clarify, you didn't simply import the unchanged keyring but modified it to reflect the actual

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-30 Thread Eugen Block
e Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com --- -Original Message----- From: Eugen Block Sent: Wednesday, September 29, 2021 8:49 PM To: 胡 玮文 Cc: Igor Fedotov ; Szabo, Istvan (Agoda) ; ceph-users@ceph.io Subject: Re: is i

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread Eugen Block
ceph-volume in the new shell. 发件人: Eugen Block<mailto:ebl...@nde.ag> 发送时间: 2021年9月29日 21:40 收件人: 胡 玮文<mailto:huw...@outlook.com> 抄送: Igor Fedotov<mailto:ifedo...@suse.de>; Szabo, Istvan (Agoda)<mailto:istvan.sz...@agoda.com>; ceph-users@ceph.io<mailto:ceph-users@ceph.

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread Eugen Block
14:26 unit.poststop -rw--- 1 ceph ceph 3021 Sep 17 14:26 unit.run -rw--- 1 ceph ceph 142 Sep 17 14:26 unit.stop -rw--- 1 ceph ceph2 Sep 20 04:15 whoami 发件人: Eugen Block<mailto:ebl...@nde.ag> 发送时间: 2021年9月29日 21:29 收件人: Igor Fedotov<mailto:ifedo...@suse.de> 抄送: 胡 玮文<mail

[ceph-users] Re: osd marked down

2021-09-29 Thread Eugen Block
Just to clarify, you didn't simply import the unchanged keyring but modified it to reflect the actual key of OSD.3, correct? If not, run 'ceph auth get osd.3' first and set the key in the osd.3.export file before importing it to ceph. Zitat von Abdelillah Asraoui : i have created

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-28 Thread Eugen Block
abo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com --- -Original Message----- From: Eugen Block Sent: Monday, September 27, 2021 7:42 PM To: ceph-users@ceph.io Subject: [

[ceph-users] Re: Orchestrator is internally ignoring applying a spec against SSDs, apparently determining they're rotational.

2021-09-27 Thread Eugen Block
mit: 1 wal_devices: rotational: 0 limit: 1 Do you know what to change to apply the plan you described? I'd be happy to try it! From: Eugen Block To: ceph-users@ceph.io Cc: Bcc: Date: Mon, 27 Sep 2021 10:06:43 + Subject: [ceph-users] Re: Orchestrator is internally ignoring applying a

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-27 Thread Eugen Block
Hi, I think 'ceph-bluestore-tool bluefs-bdev-migrate' could be of use here. I haven't tried it in a production environment yet, only in virtual labs. Regards, Eugen Zitat von "Szabo, Istvan (Agoda)" : Hi, Seems like in our config the nvme device as a wal+db in front of the ssd

[ceph-users] Re: ceph_add_cap: couldn't find snap realm 110

2021-09-27 Thread Eugen Block
, Eugen Block wrote: Good morning, could anyone tell me if the patch [1] for this tracker issue [2] is already available in any new (open)SUSE kernel (maybe Leap 15.3)? We seem to be hitting [2] on openSUSE Leap 15.1 and if there's a chance to fix it by upgrading the kernel it would be great news

[ceph-users] Re: Problem with adopting 15.2.14 cluster with cephadm on CentOS 7

2021-09-27 Thread Eugen Block
Hi, the logs states: 2021-09-27 10:47:20,415 DEBUG Could not locate podman: podman not found Have you verified if it's installed? Zitat von Manuel Holtgrewe : Hi, I have a 15.2.14 ceph cluster running on an up to date CentOS 7 that I want to adopt to cephadm. I'm trying to follow this:

[ceph-users] Re: Orchestrator is internally ignoring applying a spec against SSDs, apparently determining they're rotational.

2021-09-27 Thread Eugen Block
Hi, I read your first email again and noticed that ceph-volume already identifies the drives sdr and sds as non-rotational and as available. That would also explain the empty rejected_reasons field because they are not rejected at (this stage?). Where do you read that information that

[ceph-users] ceph_add_cap: couldn't find snap realm 110

2021-09-27 Thread Eugen Block
Good morning, could anyone tell me if the patch [1] for this tracker issue [2] is already available in any new (open)SUSE kernel (maybe Leap 15.3)? We seem to be hitting [2] on openSUSE Leap 15.1 and if there's a chance to fix it by upgrading the kernel it would be great news! Thanks!

[ceph-users] Re: Orchestrator is internally ignoring applying a spec against SSDs, apparently determining they're rotational.

2021-09-24 Thread Eugen Block
Hi, as a workaround you could just set the rotational flag by yourself: echo 0 > /sys/block/sd[X]/queue/rotational That's the one ceph-volume is searching for and it should at least enable you to deploy the rest of the OSDs. Of course, you'll need to figure out why the rotational flag is

[ceph-users] Re: Successful Upgrade from 14.2.22 to 15.2.14

2021-09-22 Thread Eugen Block
/ containers but that is still a WIP. (Our situation is complicated by the fact that we'll need to continue puppet managing things like firewall with cephadm doing the daemon placement). Cheers, Dan On Wed, Sep 22, 2021 at 10:32 AM Eugen Block wrote: Thanks for the summary, Dan! I'm still

[ceph-users] Re: Successful Upgrade from 14.2.22 to 15.2.14

2021-09-22 Thread Eugen Block
Thanks for the summary, Dan! I'm still hesitating upgrading our production environment from N to O, your experience sounds reassuring though. I have one question, did you also switch to cephadm and containerize all daemons? We haven't made a decision yet, but I guess at some point we'll

[ceph-users] Re: Modify pgp number after pg_num increased

2021-09-22 Thread Eugen Block
Hi, IIRC in a different thread you pasted your max-backfill config and it was the lowest possible value (1), right? That's why your backfill is slow. Zitat von "Szabo, Istvan (Agoda)" : Hi, By default in the newer versions of ceph when you increase the pg_num the cluster will start

[ceph-users] Re: HEALTH_WARN: failed to probe daemons or devices after upgrade to 16.2.6

2021-09-20 Thread Eugen Block
Hi, Yes! I did play with another cluster before and forgot to completely clear that node! And the fsid "46e2b13c-dab7-11eb-810b-a5ea707f1ea1" from that cluster. But then there is an error in CEPH. Because the mon the existing cluster complained about (with fsid

[ceph-users] Re: Adding cache tier to an existing objectstore cluster possible?

2021-09-20 Thread Eugen Block
And we are quite happy with our cache tier. When we got new HDD OSDs we tested if things would improve without the tier but we had to stick to it, otherwise working with our VMs was almost impossible. But this is an RBD cache so I can't tell how the other protocols perform with a cache

[ceph-users] Re: HEALTH_WARN: failed to probe daemons or devices after upgrade to 16.2.6

2021-09-18 Thread Eugen Block
Hi, Hmm. 'cephadm ls' running directly on the node does show that there is mon. I don't quite understand where it came from and I don't understand why 'ceph orch ps' didn't show this service. Thank you very much for your help. no problem. Maybe you played around and had this node in the

[ceph-users] Re: Is it normal Ceph reports "Degraded data redundancy" in normal use?

2021-09-17 Thread Eugen Block
Since I'm trying to test different erasure encoding plugin and technique I don't want the balancer active. So I tried setting it to none as Eguene suggested, and to my surprise I did not get any degraded messages at all, and the cluster was in HEALTH_OK the whole time. Interesting, maybe

[ceph-users] Re: HEALTH_WARN: failed to probe daemons or devices after upgrade to 16.2.6

2021-09-17 Thread Eugen Block
Was there a MON running previously on that host? Do you see the daemon when running 'cephadm ls'? If so, remove it with 'cephadm rm-daemon --name mon.s-26-9-17' Zitat von Fyodor Ustinov : Hi! After upgrading to version 16.2.6, my cluster is in this state: root@s-26-9-19-mon-m1:~# ceph

[ceph-users] Re: Is it normal Ceph reports "Degraded data redundancy" in normal use?

2021-09-16 Thread Eugen Block
You’re absolutely right, of course, the balancer wouldn’t cause degraded PGs. Flapping OSDs seems very likely here. Zitat von Josh Baergen : I assume it's the balancer module. If you write lots of data quickly into the cluster the distribution can vary and the balancer will try to even out

[ceph-users] Re: Is it normal Ceph reports "Degraded data redundancy" in normal use?

2021-09-16 Thread Eugen Block
Hi, I assume it's the balancer module. If you write lots of data quickly into the cluster the distribution can vary and the balancer will try to even out the placement. You can check the status with ceph balancer status and disable it if necessary: ceph balancer mode none Regards, Eugen

[ceph-users] Re: Docker & CEPH-CRASH

2021-09-16 Thread Eugen Block
m solution? Thanks! []'s Arthur On 15/09/2021 08:30, Eugen Block wrote: Hi, ceph-crash services are standalone containers, they are not running inside other containers: host1:~ # ceph orch ls NAME   RUNNING  REFRESHED  AGE PLACEMENT    

[ceph-users] Re: Docker & CEPH-CRASH

2021-09-15 Thread Eugen Block
Hi, ceph-crash services are standalone containers, they are not running inside other containers: host1:~ # ceph orch ls NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID

[ceph-users] Re: OSD Service Advanced Specification db_slots

2021-09-15 Thread Eugen Block
Hi, db_slots is still not implemented: pacific:~ # ceph orch apply -i osd.yml --dry-run Error EINVAL: Failed to validate Drive Group: Filtering for is not supported Question 2: If db_slots still *doesn't* work, is there a coherent way to divide up a solid state DB drive for use by a

[ceph-users] Re: Health check failed: 1 pools ful

2021-09-15 Thread Eugen Block
Hi Frank, I think the snapshot rotation could be an explanation. Just a few days ago we had a host failure over night and some OSDs couldn't be rebalanced entirely because they were too full. Deleting a few (large) snapshots I created last week resolved the issue. If you monitored 'ceph

[ceph-users] Re: OSD based ec-code

2021-09-14 Thread Eugen Block
Hi, consider yourself lucky that you haven't had a host failure. But I would not draw the wrong conclusions here and change the failure-domain based on luck. In our production cluster we have an EC pool for archive purposes, it all went well for quite some time and last Sunday one of the

[ceph-users] Re: How to purge/remove rgw from ceph/pacific

2021-09-11 Thread Eugen Block
Edit your rgw service specs and set „unmanaged“ to true so cephadm won’t redeploy a daemon, then remove it as you did before. See [1] for more details. [1] https://docs.ceph.com/en/pacific/cephadm/service-management.html Zitat von Cem Zafer : Hi, How to remove rgw from hosts? When I

[ceph-users] Re: mon stucks on probing and out of quorum, after down and restart

2021-09-10 Thread Eugen Block
is the best practice? redeploy failed mon? On 10. Sep 2021, at 13:08, Eugen Block wrote: Yes, give it a try. If the cluster is healthy otherwise it shouldn't be a problem. Zitat von mk : Thx Eugen, just stopping mon and remove/rename only store.db and start mon? BR Max On 10. Sep 2021, at 12

[ceph-users] Re: mon stucks on probing and out of quorum, after down and restart

2021-09-10 Thread Eugen Block
: Failed with result 'exit-code'. Sep 10 13:35:55 amon3 systemd[1]: Failed to start Ceph cluster monitor daemon. On 10. Sep 2021, at 13:08, Eugen Block wrote: Yes, give it a try. If the cluster is healthy otherwise it shouldn't be a problem. Zitat von mk : Thx Eugen, just stopping mon

[ceph-users] Re: mon stucks on probing and out of quorum, after down and restart

2021-09-10 Thread Eugen Block
Yes, give it a try. If the cluster is healthy otherwise it shouldn't be a problem. Zitat von mk : Thx Eugen, just stopping mon and remove/rename only store.db and start mon? BR Max On 10. Sep 2021, at 12:50, Eugen Block wrote: I don't have an explanation but removing the mon store from

[ceph-users] Re: mon stucks on probing and out of quorum, after down and restart

2021-09-10 Thread Eugen Block
I don't have an explanation but removing the mon store from the failed mon has resolved similar issues in the past. Could you give that a try? Zitat von mk : Hi CephFolks, I have a cluster 14.2.21-22/Ubuntu 18.04 with 3 mon’s. After going down/restart of 1 mon(amon3) it stucks on probing

[ceph-users] Re: ceph progress bar stuck and 3rd manager not deploying

2021-09-09 Thread Eugen Block
You must have missed the response to your thread, I suppose: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/YA5KLI5MFJRKVQBKUBG7PJG4RFYLBZFA/ Zitat von mabi : Hello, A few days later the ceph status progress bar is still stuck and the third mon is for some unknown reason

[ceph-users] Re: Cephadm not properly adding / removing iscsi services anymore

2021-09-08 Thread Eugen Block
n. Still not exactly sure why that fixed it, but at least it’s working again. Thanks for the suggestion. -Paul On Sep 8, 2021, at 4:12 AM, Eugen Block wrote: If you only configured 1 iscsi gw but you see 3 running, have you tried to destroy them with 'cephadm rm-daemon --name ...'? On the activ

[ceph-users] Re: radosgw manual deployment

2021-09-08 Thread Eugen Block
address) for the domain? I think 99% of the confusion is due to VERY POOR documentation!! Thanks for help. Francesco Il 01.09.21 14:14, Eugen Block ha scritto: That basically was my check list, it was all I had to do in my lab to set it up. The guide to setup a RGW manually refers to non-co

[ceph-users] Re: cephfs_metadata pool unexpected space utilization

2021-09-08 Thread Eugen Block
I assume the cluster is used in roughly the same way as before the upgrade and the load has not increased since, correct? What is the usual load, can you share some 'ceph daemonperf mds.' output? It might be unrelated but have you tried to compact the OSDs belonging to this pool, online or

[ceph-users] Re: Cephadm not properly adding / removing iscsi services anymore

2021-09-08 Thread Eugen Block
If you only configured 1 iscsi gw but you see 3 running, have you tried to destroy them with 'cephadm rm-daemon --name ...'? On the active MGR host run 'journalctl -f' and you'll see plenty of information, it should also contain information about the iscsi deployment. Or run 'cephadm logs

[ceph-users] Re: debug RBD timeout issue

2021-09-08 Thread Eugen Block
Hi, from an older cloud version I remember having to increase these settings: [DEFAULT] block_device_allocate_retries = 300 block_device_allocate_retries_interval = 10 block_device_creation_timeout = 300 The question is what exactly could cause a timeout. You write that you only see these

[ceph-users] Re: Problem mounting cephfs Share

2021-09-07 Thread Eugen Block
Could you share the exact command you're trying and then also 'ceph auth get client.'? Zitat von Hendrik Peyerl : Hi Eugen, thanks for the idea but i didn’t have anything mounted that i could unmount On 6. Sep 2021, at 09:15, Eugen Block wrote: Hi, I just got the same message in my

[ceph-users] Re: Problem mounting cephfs Share

2021-09-06 Thread Eugen Block
Hi, I just got the same message in my lab environment (octopus) which I had redeployed. The client's keyring had changed after redeployment and I think I had a stale mount. After 'umount' and 'mount' with the proper keyring it worked as expected. Zitat von Hendrik Peyerl : Hello All,

[ceph-users] Re: Replacing swift with RGW

2021-09-02 Thread Eugen Block
stack side but fails . are there openastack-shwift packages which are needed? if are there please help me to get . may be also it is the cause I am failing to run swift command on openstack cli side. thank you for your continued support. Micheal On Thu, Sep 2, 2021 at 9:14 AM Eugen Block wrote

[ceph-users] Re: Replacing swift with RGW

2021-09-02 Thread Eugen Block
not finding the object storage on the horizon dashboard , but it appears in the system information services. [image: image.png] so my question is how to configure it in order that it can appear in the dashboard . Michel On Wed, Sep 1, 2021 at 3:49 PM Eugen Block wrote: Sorry, one little detail

[ceph-users] Re: Replacing swift with RGW

2021-09-01 Thread Eugen Block
Sorry, one little detail slipped through, the '--region' flag has to be put before the 'service' name. The correct command would be: openstack endpoint create --region RegionOne swift admin http://ceph-osd3:8080/swift/v1 and respectively for the other interfaces. Zitat von Eugen Block

[ceph-users] Re: Replacing swift with RGW

2021-09-01 Thread Eugen Block
Please read carefully and inspect the (helpful) error message. The command I provided doesn't have the '--publicurl' in it because that is from an older identity api version (v2). In newer versions (v3) the endpoint commands only require one of the values 'internal', 'public' or 'admin',

[ceph-users] Re: Replacing swift with RGW

2021-09-01 Thread Eugen Block
Hi, this is not a ceph issue but your openstack cli command as the error message states. Try one interface at a time: openstack endpoint create swift public http://ceph-osd3:8080/swift/v1 --region RegionOne swift openstack endpoint create swift admin http://ceph-osd3:8080/swift/v1

[ceph-users] Re: radosgw manual deployment

2021-09-01 Thread Eugen Block
alues that conforms to my installation and applying all the radosgw-admin setup you indicated lead me no results: Always as the beginning The wrong must be somewhere else... Do you have a checklist? Francesco Il 31.08.21 14:53, Eugen Block ha scritto: How exactly did you create the rgw(s),

[ceph-users] Re: radosgw manual deployment

2021-08-31 Thread Eugen Block
s3 gateway; scheme is really http (checked querying with get-rgw-api-scheme). Any clue / suggestion is welcome. Francesco Il 24.08.21 11:22, Eugen Block ha scritto: Hi, I assume that the "latest" docs are already referring to quincy, if you check the pacific docs (https://

[ceph-users] Re: A simple erasure-coding question about redundance

2021-08-27 Thread Eugen Block
Hi, 1. two disks would fail where both failed disks are not on the same host? I think ceph would be able to find a PG distributed across all hosts avoiding the two failed disks, so ceph would be able to repair and reach a healthy status after a while? yes, if there is enough disk space

[ceph-users] Re: radosgw manual deployment

2021-08-24 Thread Eugen Block
Hi, I assume that the "latest" docs are already referring to quincy, if you check the pacific docs (https://docs.ceph.com/en/pacific/mgr/dashboard/) that command is not mentioned. So you'll probably have to use the previous method of configuring the credentials. Regards, Eugen Zitat

[ceph-users] Re: Missing OSD in SSD after disk failure

2021-08-24 Thread Eugen Block
note I using containers, not standalone OSDs. Any ideas? Regards, Eric Message: 2 Date: Fri, 20 Aug 2021 06:56:59 + From: Eugen Block Subject: [ceph-users] Re: Missing OSD in SSD after disk failure To: ceph-users@ceph.io Message-ID: <20210820065

[ceph-users] Re: Very beginner question for cephadm: config file for bootstrap and osd_crush_chooseleaf_type

2021-08-20 Thread Eugen Block
Hi, you can just set the config option with 'ceph config set ...' after your cluster has been bootstrapped. See [1] for more details about the config store. [1] https://docs.ceph.com/en/latest/rados/configuration/ceph-conf/#monitor-configuration-database Zitat von Dong Xie : Dear

[ceph-users] Re: Ceph status shows 'updating'

2021-08-20 Thread Eugen Block
What is the output of 'ceph orch upgrade status'? Did you (maybe accidentally) start an update? You can stop it with 'ceph orch upgrade stop'. Zitat von "Paul Giralt (pgiralt)" : The output of my ’ceph status’ shows the following: progress: Updating node-exporter deployment (-1 ->

[ceph-users] Re: Question about mon and manager(s)

2021-08-20 Thread Eugen Block
Hi, 1. In my cluster I have three monitors; when one monitor is down (I simply shut down) raising a ceph -s underline that there are two monitors alive and one down; when 2/3 of monitors down the cluster became unresponsive (ceph -s remains stuck); is this normal? yes, this is expected.

[ceph-users] Re: Missing OSD in SSD after disk failure

2021-08-20 Thread Eugen Block
Hi, this seems to be a reoccuring issue, I had the same just yesterday in my lab environment running on 15.2.13. If I don't specify other criteria in the yaml file then I'll end up with standalone OSDs instead of the desired rocksDB on SSD. Maybe this is still a bug, I didn't check. My

[ceph-users] Re: Discard / Trim does not shrink rbd image size when disk is partitioned

2021-08-12 Thread Eugen Block
Hi, have you checked ‚rbd sparsify‘ to reclaim unused space? Zitat von Boris Behrens : Hi everybody, we just stumbled over a problem where the rbd image does not shrink, when files are removed. This only happenes when the rbd image is partitioned. * We tested it with centos8/ubuntu20.04 with

[ceph-users] Re: How to safely turn off a ceph cluster

2021-08-11 Thread Eugen Block
Hi, there's plenty of information available online, e.g. the Red Hat docs [1], mailing list threads [2]. [1]

[ceph-users] Re: Dashboard Montitoring: really suppress messages

2021-07-30 Thread Eugen Block
Hi, you can disable or modify the configured alerts in: /var/lib/ceph//etc/prometheus/alerting/ceph_alerts.yml After restarting the container those changes should be applied. Regards, Eugen Zitat von E Taka <0eta...@gmail.com>: Hi, we have enabled Cluster → Monitoring in the Dashboard.

[ceph-users] Re: large directory /var/lib/ceph/$FSID/removed/

2021-07-28 Thread Eugen Block
iced that before, that it were just these daemons (just FYI, no further help needed here). Am Mi., 28. Juli 2021 um 09:10 Uhr schrieb Eugen Block : Hi, the docs [1] only state: > /var/lib/ceph//removed contains old daemon data > directories for stateful daemons (e.g., monitor, prometheus) th

[ceph-users] Re: large directory /var/lib/ceph/$FSID/removed/

2021-07-28 Thread Eugen Block
Hi, the docs [1] only state: /var/lib/ceph//removed contains old daemon data directories for stateful daemons (e.g., monitor, prometheus) that have been removed by cephadm. So that directory should not grow, I'm not sure if does in your case because you write "now 12 GB". Are you

[ceph-users] Re: OSD failed to load OSD map for epoch

2021-07-28 Thread Eugen Block
smartmontools 7.1, which will crash the kernel on e.g. "smartctl -a /dev/nvme0". Before switching to Octopus containers, I was using smartmontools from Debian backports, which does not have this problem. Does Pacific have newer smartmontools? // Best wishes; Johan On 2021-07-2

[ceph-users] Re: OSD failed to load OSD map for epoch

2021-07-27 Thread Eugen Block
Hi, did you read this thread [1] reporting a similar issue? It refers to a solution described in [2] but the OP in [1] recreated all OSDs, so it's not clear what the root cause was. Can you start the OSD with more verbose (debug) output and share that? Does your cluster really have only

[ceph-users] Re: ceph-users Digest, Vol 102, Issue 52

2021-07-26 Thread Eugen Block
igest..." Today's Topics: 1. Re: inbalancing data distribution for osds with custom device class (renjianxinlover) 2. Re: inbalancing data distribution for osds with custom device class (Eugen Block) -- Message: 1 Date: Wed, 21 J

[ceph-users] Re: Where to find ceph.conf?

2021-07-23 Thread Eugen Block
Hi, you can find the ceph.conf here: /var/lib/ceph/7bdffde0-623f-11eb-b3db-fa163e672db2/mon.ses7-host1/config If you edit that file and restart the container you'll see the changes. But as I wrote in your other thread, this won't be enough to migrate MONs to a different IP address, you

[ceph-users] Re: Procedure for changing IP and domain name of all nodes of a cluster

2021-07-22 Thread Eugen Block
Note that there's a similar field in the nova database (connection_info): ---snip--- MariaDB [nova]> select connection_info from block_device_mapping where instance_uuid='bbc33a1d-10c0-47b1-8179-304899c4546c';

[ceph-users] Re: inbalancing data distribution for osds with custom device class

2021-07-21 Thread Eugen Block
Hi, three OSDs is just not enough, if possible you should add more SSDs to the index pool. Have you checked the disk saturation (e.g. with iostat)? I would expect a high usage. Zitat von renjianxinlover : Ceph: ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous

[ceph-users] Re: How to make CephFS a tiered file system?

2021-07-20 Thread Eugen Block
Hi, I'm not sure if that's what you need but ceph file layouts [1] could meet your requirements. Your CephFS can consist of multiple pools (replicated or EC), and with xattr you can define different pools to be used for specific directories. Does that help? Regards, Eugen [1]

[ceph-users] Re: "ceph fs perf stats" and "cephfs-top" don't work

2021-07-16 Thread Eugen Block
"halflife": 60 }, "recall_caps_throttle": { "value": 0, "halflife": 1.5 }, "recall_caps_throttle2o": { "value": 0, "halflife": 0.5 }, "session_cache_livenes

[ceph-users] Re: "ceph fs perf stats" and "cephfs-top" don't work

2021-07-15 Thread Eugen Block
Hi, I just setup a virtual one-node cluster (16.2.5) to check out cephfs-top. Regarding the number of clients I was a little surprised, too, in the first couple of minutes the number switched back and forth between 0 and 1 although I had not connected any client yet. But after a while

[ceph-users] Re: cephadm stuck in deleting state

2021-07-14 Thread Eugen Block
Hi, do you see the daemon on that iscsi host(s) with 'cephadm ls'? If the answer is yes, you could remove it with cephadm, too: cephadm rm-daemon --name iscsi.iscsi Does that help? Zitat von Fyodor Ustinov : Hi! I have fresh installed pacific root@s-26-9-19-mon-m1:~# ceph version ceph

[ceph-users] Re: PG has no primary osd

2021-07-13 Thread Eugen Block
Hi, what does your 'ceph osd df tree' look like? I've read about these warnings when PGs are incomplete but not when all are active+clean. Zitat von Andres Rojas Guerrero : Hi, recently in a Nautilus cluster version 14.2.6 I have changed the rule crush map to host type instead osd, all

[ceph-users] Re: Continuing Ceph Issues with OSDs falling over

2021-07-07 Thread Eugen Block
Hi, can you tell a bit more what exactly happens? Currently I'm having an issue where every time I add a new server it adds the osd on the node and then a few random ods on the current hosts will all fall over and I'll only be able to get them up again by restart the daemons. What is the

[ceph-users] Re: Ceph Managers dieing?

2021-06-17 Thread Eugen Block
Hi, don't give up on Ceph. ;-) Did you try any of the steps from the troubleshooting section [1] to gather some events and logs? Could you share them, and maybe also some more details about that cluster? Did you enable any non-default mgr modules? There have been a couple reports related

[ceph-users] Re: cephadm failed in Pacific release: Unable to set up "admin" label

2021-06-14 Thread Eugen Block
strange that the commands are all using 'octopus' instead of 'pacific'. Ceph docu is always some a detective work... === Ralph On 14.06.21 15:31, Eugen Block wrote: Hi, I asked a similar question three weeks ago (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message

[ceph-users] Re: cephadm failed in Pacific release: Unable to set up "admin" label

2021-06-14 Thread Eugen Block
Hi, I asked a similar question three weeks ago (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/63U7WYHFTHSPUQTAR73W2AIE3E3PJJ4X/), but in my case the bootstrap worked fine. But adding a label (e. g. _admin) had no effect although the host should have had the admin keyring

[ceph-users] Re: Ceph Octopus - How to customize the Grafana configuration

2021-06-10 Thread Eugen Block
# ceph orch start grafana Thanks for your help === Ralph On 10.06.21 09:31, Eugen Block wrote: Hi, you can edit the config file /var/lib/ceph//grafana.host1/etc/grafana/grafana.ini (created by cephadm) and then restart the container. This works in my octopus lab environment.

[ceph-users] Re: Ceph Octopus - How to customize the Grafana configuration

2021-06-10 Thread Eugen Block
Hi, you can edit the config file /var/lib/ceph//grafana.host1/etc/grafana/grafana.ini (created by cephadm) and then restart the container. This works in my octopus lab environment. Regards, Eugen Zitat von Ralph Soika : Hello, I have installed and bootsraped a Ceph manager node via

[ceph-users] Re: delete stray OSD daemon after replacing disk

2021-06-10 Thread Eugen Block
Can you share your 'ceph osd tree'? You can remove the stray osd "old school" with 'ceph osd purge 1 [--force]' if you're really sure. Zitat von mabi : Small correction in my mail below, I meant to say Octopus and not Nautilus, so I am running ceph 15.2.13. ‐‐‐ Original Message

[ceph-users] Re: rebalancing after node more

2021-05-27 Thread Eugen Block
cannot be brought back to up state for some reason, even though osd processes are running on the host. Kind regards, Rok On Thu, May 27, 2021 at 3:32 PM Eugen Block wrote: Hi, this sounds like your crush rule(s) for one or more pools can't place the PGs because the host is missing. Please

[ceph-users] Re: How to add back stray OSD daemon after node re-installation

2021-05-27 Thread Eugen Block
On Thursday, May 27, 2021 3:28 PM, Eugen Block wrote: Can you try with both cluster and osd fsid? Something like this: pacific2:~ # cephadm deploy --name osd.2 --fsid acbb46d6-bde3-11eb-9cf2-fa163ebb2a74 --osd-fsid bc241cd4-e284-4c5a-aad2-5744632fc7fc I tried to reproduce a similar scena

[ceph-users] Re: rebalancing after node more

2021-05-27 Thread Eugen Block
Hi, this sounds like your crush rule(s) for one or more pools can't place the PGs because the host is missing. Please share ceph pg dump pgs_brief | grep undersized ceph osd tree ceph osd pool ls detail and the crush rule(s) for the affected pool(s). Zitat von Rok Jaklič : Hi, I have

[ceph-users] Re: How to add back stray OSD daemon after node re-installation

2021-05-27 Thread Eugen Block
91a86f20-8083-40b1-8bf1-fe35fac3d677 osd id2 osdspec affinity all-available-devices type block vdo 0 devices /dev/sda ‐‐‐ Original Message ‐‐‐ On Thursday, May 27, 2021 12:32 PM,

[ceph-users] Re: How to add back stray OSD daemon after node re-installation

2021-05-27 Thread Eugen Block
assert osd_fsid AssertionError Any ideas what is wrong here? Regards, Mabi ‐‐‐ Original Message ‐‐‐ On Thursday, May 27, 2021 12:13 PM, Eugen Block wrote: Hi, I posted a link to the docs [1], [2] just yesterday ;-) You should see the respective OSD in the output of 'cephadm ceph-v

[ceph-users] Re: How to add back stray OSD daemon after node re-installation

2021-05-27 Thread Eugen Block
Hi, I posted a link to the docs [1], [2] just yesterday ;-) You should see the respective OSD in the output of 'cephadm ceph-volume lvm list' on that node. You should then be able to get it back to cephadm with cephadm deploy --name osd.x But I haven't tried this yet myself, so please

[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-27 Thread Eugen Block
Stian Olstad : On 27.05.2021 11:17, Eugen Block wrote: That's not how it's supposed to work. I tried the same on an Octopus cluster and removed all filters except: data_devices: rotational: 1 db_devices: rotational: 0 My Octopus test osd nodes have two HDDs and one SSD, I removed all OSDs and redeployed

[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-27 Thread Eugen Block
That's not how it's supposed to work. I tried the same on an Octopus cluster and removed all filters except: data_devices: rotational: 1 db_devices: rotational: 0 My Octopus test osd nodes have two HDDs and one SSD, I removed all OSDs and redeployed on one node. This spec file results

[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-27 Thread Eugen Block
milar behaviour. Zitat von Kai Stian Olstad : On 26.05.2021 18:12, Eugen Block wrote: Could you share the output of lsblk -o name,rota,size,type from the affected osd node? # lsblk -o name,rota,size,type NAME

[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-26 Thread Eugen Block
Kai Stian Olstad : On 26.05.2021 11:16, Eugen Block wrote: Yes, the LVs are not removed automatically, you need to free up the VG, there are a couple of ways to do so, for example remotely: pacific1:~ # ceph orch device zap pacific4 /dev/vdb --force or directly on the host with: pacific1

[ceph-users] Re: Ceph osd will not start.

2021-05-26 Thread Eugen Block
tion rules, so it does not try and create too many osds on the same node at the same time. On Wed, 26 May 2021 at 08:25, Eugen Block wrote: Hi, I believe your current issue is due to a missing keyring for client.bootstrap-osd on the OSD node. But even after fixing that you'll probably still

[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-26 Thread Eugen Block
Kai Stian Olstad : On 26.05.2021 08:22, Eugen Block wrote: Hi, did you wipe the LV on the SSD that was assigned to the failed HDD? I just did that on a fresh Pacific install successfully, a couple of weeks ago it also worked on an Octopus cluster. No, I did not wipe the LV. Not sure what you

<    5   6   7   8   9   10   11   12   13   14   >