[ceph-users] Re: [16.2.6] When adding new host, cephadm deploys ceph image that no longer exists

2021-09-29 Thread Andrew Gunnerson
Thank you very much! The previous attempts at adding new hosts with the missing image seems to have left cephadm in a bad state. We restarted the mgrs and then did an upgrade to the same version using: ceph orch upgrade start --ceph-version 16.2.6 and that seems to have deployed new images

[ceph-users] reducing mon_initial_members

2021-09-29 Thread Rok Jaklič
Can I reduce mon_initial_members to one host after already being set to two hosts? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread Szabo, Istvan (Agoda)
Actually I don't have containerized deployment, my is normal one. So it should work the lvm migrate. Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com

[ceph-users] Re: prometheus - figure out which mgr (metrics endpoint) that is active

2021-09-29 Thread Ernesto Puerta
Hi Karsten, Endpoints returning no data shouldn't be an issue. If all endpoints are scraped under the same job, they'll only differ on the "instance" label. The "instance" label is being progressively removed from the ceph_* metric queries (as it only makes sense for node exporter ones). In the

[ceph-users] Re: [16.2.6] When adding new host, cephadm deploys ceph image that no longer exists

2021-09-29 Thread David Orman
It appears when an updated container for 16.2.6 (there was a remoto version included with a bug in the first release) was pushed, the old one was removed from quay. We had to update our 16.2.6 clusters to the 'new' 16.2.6 version, and just did the typical upgrade with the image specified. This

[ceph-users] Re: Leader election loop reappears

2021-09-29 Thread Manuel Holtgrewe
Hi, thanks for the suggestion. In the case that I again get a rogue MON, I'll try to do this. I'll also need to figure out then how to pull the meta data from the host, might be visible with `docker inspect`. Cheers, On Wed, Sep 29, 2021 at 6:06 PM wrote: > Manuel; > > Reading through this

[ceph-users] Re: Leader election loop reappears

2021-09-29 Thread DHilsbos
Manuel; Reading through this mailing list this morning, I can't help but mentally connect your issue to Javier's issue. In part because you're both running 16.2.6. Javier's issue seems to be that OSDs aren't registering public / cluster network addresses correctly. His most recent message

[ceph-users] [16.2.6] When adding new host, cephadm deploys ceph image that no longer exists

2021-09-29 Thread Andrew Gunnerson
Hello all, I'm trying to troubleshoot a test cluster that is attempting to deploy an old quay.io/ceph/ceph@sha256: image that no longer exists when adding a new host. The cluster is running 16.2.6 and was deployed last week with: cephadm bootstrap --mon-ip $(facter -p ipaddress)

[ceph-users] Re: osd marked down

2021-09-29 Thread Abdelillah Asraoui
I must have imported osd.2 key instead, now osd.3 has the same key as osd.2 ceph auth import -i osd.3.export How do we update this ? thanks! On Wed, Sep 29, 2021 at 2:13 AM Eugen Block wrote: > Just to clarify, you didn't simply import the unchanged keyring but > modified it to reflect

[ceph-users] Write Order during Concurrent S3 PUT on RGW

2021-09-29 Thread Scheurer François
Dear All, RGW provides atomic PUT in order to guarantee write consistency. cf: https://ceph.io/en/news/blog/2011/atomicity-of-restful-radosgw-operations/ But my understanding is that the are no guarantee regarding the PUT order sequence. So basically, if doing a storage class migration: aws

[ceph-users] Re: [EXTERNAL] RE: OSDs flapping with "_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB"

2021-09-29 Thread Dave Piper
Some interesting updates on our end. This cluster (condor) is in a multisite RGW zonegroup with another cluster (albans). Albans is still on nautilus and was healthy back when we started this thread. As a last resort, we decided to destroy condor and recreate it, putting it back in the

[ceph-users] Failing to mount PVCs

2021-09-29 Thread Fatih Ertinaz
Hi, We recently started to observe issues similar to the following in our cluster environment: Warning FailedMount 31s (x8 over 97s) kubelet, ${NODEIP} MountVolume.SetUp failed for volume "${PVCNAME}" : mount command failed, status: Failure, reason: failed to mount volume /dev/rbd2 [ext4] to

[ceph-users] rgw user metadata default_storage_class not honnored

2021-09-29 Thread Scheurer François
Dear All The rgw user metadata "default_storage_class" is not working as expected on Nautilus 14.2.15. See the doc: https://docs.ceph.com/en/nautilus/radosgw/placement/#user-placement S3 API PUT with the header x-amz-storage-class:NVME is working as expected. But without this header RGW

[ceph-users] Re: prometheus - figure out which mgr (metrics endpoint) that is active

2021-09-29 Thread Karsten Nielsen
OK thanks for that explanation. Would be awesome if you got time to do the patches upstream. It does seem like a lot of work. I will get cracking at it. On 28-09-2021 22:38, David Orman wrote: We scrape all mgr endpoints since we use external Prometheus clusters, as well. The query results

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread Eugen Block
That's what I did and pasted the results in my previous comments. Zitat von 胡 玮文 : Yes. And “cephadm shell” command does not depend on the running daemon, it will start a new container. So I think it is perfectly fine to stop the OSD first then run the “cephadm shell” command, and run

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread 胡 玮文
Yes. And “cephadm shell” command does not depend on the running daemon, it will start a new container. So I think it is perfectly fine to stop the OSD first then run the “cephadm shell” command, and run ceph-volume in the new shell. 发件人: Eugen Block 发送时间: 2021年9月29日 21:40

[ceph-users] Leader election loop reappears

2021-09-29 Thread Manuel Holtgrewe
Dear all, I was a bit too optimistic in my previous email. It looks like the leader election loop reappeared. I could fix it by stopping the rogue mon daemon but I don't know how to fix it for good. I'm running a 16.2.6 Ceph cluster on CentOS 7.9 servers (6 servers in total). I have about 35

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread Eugen Block
The OSD has to be stopped in order to migrate DB/WAL, it can't be done live. ceph-volume requires a lock on the device. Zitat von 胡 玮文 : I’ve not tried it, but how about: cephadm shell -n osd.0 then run “ceph-volume” commands in the newly opened shell. The directory structure seems

[ceph-users] Re: [EXTERNAL] RE: OSDs flapping with "_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB"

2021-09-29 Thread Igor Fedotov
On 9/21/2021 10:44 AM, Dave Piper wrote: I still can't find a way to get ceph-bluestore-tool working in my containerized deployment. As soon as the OSD daemon stops, the contents of /var/lib/ceph/osd/ceph- are unreachable. Some speculations on the above. /var/lib/ceph/osd/ceph- is just a

[ceph-users] Re: Cephadm set rgw SSL port

2021-09-29 Thread Sebastian Wagner
Here you go: https://github.com/ceph/ceph/pull/43332 Am 28.09.21 um 15:49 schrieb Sebastian Wagner: > Am 28.09.21 um 15:12 schrieb Daniel Pivonka: >> Hi, >> >> 1. I believe the field is called 'rgw_frontend_port' >> 2. I don't think something like that exists but probably should > > At least for

[ceph-users] Re: 16.2.6: clients being incorrectly directed to the OSDs cluster_network address

2021-09-29 Thread Javier Cacheiro
Digging further and checking osd metadata I have what it seems like a bug assigning the addresses. In most OSDs looking at ceph osd metadata the result is fine and the front addresses are correctly configured through the 10.113 public network, like this one: == osd.0 == "back_addr": "[v2:

[ceph-users] Re: S3 Bucket Notification requirement

2021-09-29 Thread Yuval Lifshitz
aws-cli v2 do not support the old signature types. can you please install aws-cli v1 [1] and try with it? [1] https://docs.aws.amazon.com/cli/latest/userguide/install-linux.html On Mon, Sep 27, 2021 at 6:45 PM Sanjeev Jha wrote: > Hi Yuval, > > I have changed the sns signature version as

[ceph-users] Re: osd marked down

2021-09-29 Thread Eugen Block
Just to clarify, you didn't simply import the unchanged keyring but modified it to reflect the actual key of OSD.3, correct? If not, run 'ceph auth get osd.3' first and set the key in the osd.3.export file before importing it to ceph. Zitat von Abdelillah Asraoui : i have created

[ceph-users] SSD partitioned for HDD wal+db plus SSD osd

2021-09-29 Thread Chris Dunlop
Hi, Is there any way of using "ceph orch apply osd" to partition an SSD as wal+db for a HDD OSD, with the rest of the SSD as a separate OSD? E.g. on a machine (here called 'k1') with a small boot drive and a single HDD and SSD, this will create an OSD on the HDD, with wal+db on a 60G