[ceph-users] Re: 16.2.6 OSD down, out but container running....

2021-10-25 Thread Stefan Kooman
On 10/26/21 01:14, Marco Pizzolo wrote: Hello Everyone, I'm seeing an issue where the podman container is running, but the osd is being reported as down and out. restarting service doesn't help, neither does rebooting the host. What am I missing? Can you try: ceph osd in $osd.id Gr. Stefan

[ceph-users] 回复: 16.2.6 OSD down, out but container running....

2021-10-25 Thread 胡 玮文
Could you post the logs of the problematic OSDs? E.g.: cephadm logs --name osd.0 发件人: Marco Pizzolo 发送时间: 2021年10月26日 7:15 收件人: ceph-users 主题: [ceph-users] 16.2.6 OSD down, out but container running Hello Everyone, I'm seeing an issu

[ceph-users] 16.2.6 OSD down, out but container running....

2021-10-25 Thread Marco Pizzolo
Hello Everyone, I'm seeing an issue where the podman container is running, but the osd is being reported as down and out. restarting service doesn't help, neither does rebooting the host. What am I missing? Thanks, Marco ___ ceph-users mailing list --

[ceph-users] Re: Upgrade to 16.2.6 and osd+mds crash after bluestore_fsck_quick_fix_on_mount true

2021-10-25 Thread Igor Fedotov
Hi Beard, curious if that cluster had been created by pre-Nautilus release, e.g. Luminous or Kraken? Thanks, Igor On 10/22/2021 3:53 PM, Beard Lionel wrote: Hi, I had exactly the same behaviour: - upgrade from nautilus to pacific - same warning message - set config option - restart osd. I

[ceph-users] Re: MDS not becoming active after migrating to cephadm

2021-10-25 Thread Magnus Harlander
Hi, I just migrated to cephadm on my 2 node octopus cluster. I have the same problems with the mds started in a container not being available to ceph. Had to run the old systemd mds, to keep the fs available. some outputs: =

[ceph-users] cephadm does not find podman objects for osds

2021-10-25 Thread Magnus Harlander
Hi, after converting my 2 node cluster to cephadm I'm in lots of trouble. - containerized mds are not available in the cluster. I must   run mds from systemd to make my fs available. - osd podman objects are not found after a reboot of one node. I   don't want to test it on the second node, beca

[ceph-users] Re: RGW/multisite sync traffic rps

2021-10-25 Thread Stefan Schueffler
Hi Istvan, we don’t have thus many users constantly uploading or deleting objects. In our environment, there are very little (around 2 - 10) PUTs per second. This is exactly what makes me wonder about the huge number of sync requests - as there should not be any (apart from a very tiny little a

[ceph-users] Re: Upgrade to 16.2.6 and osd+mds crash after bluestore_fsck_quick_fix_on_mount true

2021-10-25 Thread mgrzybowski
Hi Igor In ceph.conf: [osd] debug bluestore = 10/30 systemctl start ceph-osd@2 ~# ls -alh /var/log/ceph/ceph-osd.2.log -rw-r--r-- 1 ceph ceph 416M paź 25 21:08 /var/log/ceph/ceph-osd.2.log /var/log/ceph/ceph-osd.2.log | gzip > ceph-osd.2.log.gz Full compressed log on gdrive: https://driv

[ceph-users] Re: jj's "improved" ceph balancer

2021-10-25 Thread Jonas Jelten
Hi Josh, yes, there's many factors to optimize... which makes it kinda hard to achieve an optimal solution. I think we have to consider all these things, in ascending priority: * 1: Minimize distance to CRUSH (prefer fewest upmaps, and remove upmap items if balance is better) * 2: Relocation o

[ceph-users] Re: jj's "improved" ceph balancer

2021-10-25 Thread Jonas Jelten
Hi Erich! Yes, in most cases the mgr-balancer will happily accept jj-balancer movements and neither reverts nor worsens its optimizations. It just generates new upmap items or removes existing ones, just like the mgr-balancer (which has to be in upmap mode of course). So the intended usage is th

[ceph-users] Re: Rebooting one node immediately blocks IO via RGW

2021-10-25 Thread DHilsbos
Troels; This sounds like a failure domain issue. If I remember correctly, Ceph defaults to a failure domain of disk (osd), while you need a failure domain of host. Could you do a ceph -s while one of the hosts is offline? You're looking for the HEALTH_ flag, and any errors other than slow op

[ceph-users] Re: failing dkim

2021-10-25 Thread DHilsbos
MJ; A lot of mailing lists "rewrite" the origin address to one that matches the mailing list server. Here's an example from the Samba mailing list: "samba ; on behalf of; Rowland Penny via samba ". This mailing list relays the email, without modifying the sender, or the envelope address. For

[ceph-users] Re: How to make HEALTH_ERR quickly and pain-free

2021-10-25 Thread DHilsbos
MJ; Assuming that you have a replicated pool with 3 replicas and min_size = 2, I would think stopping 2 OSD daemons, or 2 OSD containers would guarantee HEALTH_ERR. Similarly, if you have a replicated pool with 2 replicas, still with min_size = 2, stopping 1 OSD should do the trick. Thank you

[ceph-users] Re: Doing SAML2 Auth With Containerized mgrs

2021-10-25 Thread Edward R Huyer
No worries. It's a pretty specific problem, and the documentation could be better. -Original Message- From: Yury Kirsanov Sent: Monday, October 25, 2021 12:17 PM To: Edward R Huyer Cc: ceph-users@ceph.io Subject: [ceph-users] Re: Doing SAML2 Auth With Containerized mgrs Hi Edward, Ye

[ceph-users] Re: Doing SAML2 Auth With Containerized mgrs

2021-10-25 Thread Yury Kirsanov
Hi Edward, Yes, you probably are right, I thought about dashboard SSL certificate, not the SAML2, sorry for that. Regards, Yury. On Tue, Oct 26, 2021 at 3:10 AM Edward R Huyer wrote: > I don’t think that’s correct? I already have a certificate set up for > HTTPS, and it doesn’t show up in the

[ceph-users] Re: Doing SAML2 Auth With Containerized mgrs

2021-10-25 Thread Edward R Huyer
I don’t think that’s correct? I already have a certificate set up for HTTPS, and it doesn’t show up in the SAML2 configuration. Maybe I’m mistaken, but I think the SAML2 cert is separate from the regular HTTPS cert? From: Yury Kirsanov Sent: Monday, October 25, 2021 11:52 AM To: Edward R Huye

[ceph-users] Re: Doing SAML2 Auth With Containerized mgrs

2021-10-25 Thread Yury Kirsanov
Hi Edward, You need to set configuration like this, assuming that certificate and key are on your local disk: ceph mgr module disable dashboard ceph dashboard set-ssl-certificate -i .crt ceph dashboard set-ssl-certificate-key -i .key ceph config-key set mgr/cephadm/grafana_crt -i .crt ceph config-

[ceph-users] Doing SAML2 Auth With Containerized mgrs

2021-10-25 Thread Edward R Huyer
Continuing my containerized Ceph adventures I'm trying to set up SAML2 auth for the dashboard (specifically pointing at the institute Shibboleth service). The service requires the use of the x509 certificates. Following the instructions in the documentation ( https://docs.ceph.com/en/late

[ceph-users] Re: jj's "improved" ceph balancer

2021-10-25 Thread E Taka
Hi Jonas, I'm impressed, Thanks! I have a question about the usage: do I have to turn off the automatic balancing feature (ceph balancer off)? Do the upmap balancer and your customizations get in each other's way, or can I run your script from time to time? Thanks Erich Am Mo., 25. Okt. 2021 um

[ceph-users] Re: s3cmd does not show multiparts in nautilus RGW on specific bucket (--debug shows loop)

2021-10-25 Thread Boris Behrens
Hi Casey, thanks a lot for that hint. That sound a lot like this is the problem. Is there a way to show incomplete multipart uploads via radosgw-admin? So I would be able to cancel it. Upgrading to octopus might take a TON of time, as we have 1.1 PiB in 160 OSDs rotational disks. :) Am Mo., 25.

[ceph-users] Re: s3cmd does not show multiparts in nautilus RGW on specific bucket (--debug shows loop)

2021-10-25 Thread Casey Bodley
hi Boris, this sounds a lot like https://tracker.ceph.com/issues/49206, which says "When deleting a bucket with an incomplete multipart upload that has about 2000 parts uploaded, we noticed an infinite loop, which stopped s3cmd from deleting the bucket forever." i'm afraid this fix was merged afte

[ceph-users] failing dkim

2021-10-25 Thread mj
Hi, This is not about ceph, but about this ceph-users mailinglist. We have recently started using DKIM/DMARC/SPF etc, and since then we notice that the emails from this ceph-users mailinglist come with either a - failing DKIM signature or - no DKIM signature at all. Many of the other mailingl

[ceph-users] s3cmd does not show multiparts in nautilus RGW on specific bucket (--debug shows loop)

2021-10-25 Thread Boris Behrens
Good day everybody, I just came across very strange behavior. I have two buckets where s3cmd hangs when I try to show current multipart uploads. When I use --debug I see that it loops over the same response. What I tried to fix it on one bucket: * radosgw-admin bucket check --bucket=BUCKETNAME *

[ceph-users] Re: jj's "improved" ceph balancer

2021-10-25 Thread Jonas Jelten
Hi Dan, basically it's this: when you have a server that is so big, crush can't utilize it the same way as the other smaller servers because of the placement constraints, the balancer doesn't balance data on the smaller servers any more, because it just "sees" the big one to be too empty. To m

[ceph-users] Re: v15.2.15 Octopus released

2021-10-25 Thread Stefan Kooman
On 10/20/21 21:57, David Galloway wrote: We're happy to announce the 15th backport release in the Octopus series. We recommend users to update to this release. ... Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at https://download.ceph.com/tarballs/ceph-15.2.15.t

[ceph-users] Re: Rebooting one node immediately blocks IO via RGW

2021-10-25 Thread Eugen Block
Hi, what's the pool's min_size? ceph osd pool ls detail Zitat von Troels Hansen : I have a strange issue.. Its a 3 node cluster, deployed on Ubuntu, on containers, running version 15.2.4, docker.io/ceph/ceph:v15 Its only running RGW, and everything seems fine, and everyting works. No er

[ceph-users] Rebooting one node immediately blocks IO via RGW

2021-10-25 Thread Troels Hansen
I have a strange issue.. Its a 3 node cluster, deployed on Ubuntu, on containers, running version 15.2.4, docker.io/ceph/ceph:v15 Its only running RGW, and everything seems fine, and everyting works. No errors and the cluster is healthy. As soon as one node is restarted all IO is blocked, app