[ceph-users] Re: monitor not joining quorum

2021-10-19 Thread Denis Polom
Hi, I've checked it, there is not IP address collision, arp tables are OK, mtu also and according tcpdump there are not packet being lost. On 10/19/21 21:36, Konstantin Shalygin wrote: Hi, On 19 Oct 2021, at 21:59, Denis Polom wrote: 2021-10-19 16:22:07.629 7faec9dd2700  1

[ceph-users] config db host filter issue

2021-10-19 Thread Richard Bade
Hi Everyone, I think this might be a bug so I'm wondering if anyone else has seen this. The issue is that config db filters for host don't seem to work. I was able to reproduce this on both prod and dev clusters that I tried it on with Nautilus 14.2.22. The osd I'm testing (osd.0) is under this

[ceph-users] Re: Cluster down

2021-10-19 Thread Alex Gorbachev
Hi Jorge, I am referring to a whole different network. Redundant switches are fine to protect against physical failure, but they can get congested too. Refer to https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_redundancy Section 5.8.1 Our setup is - ring0 goes on the fast switches

[ceph-users] Trying to debug "Failed to send data to Zabbix"

2021-10-19 Thread shubjero
Hey all, Recently upgraded to Ceph Octopus (15.2.14). We also run Zabbix 5.0.15. Have had ceph/zabbix monitoring for a long time. After the Ceph Octopus update I installed the latest version of the Ceph template in Zabbix

[ceph-users] Re: [EXTERNAL] Re: Multisite Pubsub - Duplicates Growing Uncontrollably

2021-10-19 Thread Alex Hussein-Kershaw
Hi Yuval, Thanks again for the info, also for opening the tracker issue, we'll keep an eye on that/update with comments as we progress this issue. We're using two pubsub zones as we have two clients working in sync (using both S3 and CephFS to provide storage for complex/historical reasons

[ceph-users] Re: monitor not joining quorum

2021-10-19 Thread Denis Polom
also on monitor logs on monitor that is unable to join the quorum I see following in the log file: 2021-10-19 16:22:07.629 7faec9dd2700  1 mon.ceph1@0(synchronizing) e4 handle_auth_request failed to assign global_id 2021-10-19 16:22:08.193 7faec8dd0700  1 mon.ceph1@0(synchronizing) e4

[ceph-users] Re: Stretch cluster experiences in production?

2021-10-19 Thread Gregory Farnum
On Tue, Oct 19, 2021 at 9:11 AM Matthew Vernon wrote: > > Hi, > > On 18/10/2021 23:34, Gregory Farnum wrote: > > On Fri, Oct 15, 2021 at 8:22 AM Matthew Vernon > > wrote: > > >> Also, if I'm using RGWs, will they do the right thing location-wise? > >> i.e. DC A RGWs will talk to DC A OSDs

[ceph-users] Re: A change in Ceph leadership...

2021-10-19 Thread Alex Gorbachev
Hi Sage, It's been an amazing journey - the Ceph project has made a huge difference in the way the world stores information, likely for many years to come. Stability and scalability of Ceph has been a real pleasure to work with. We at ISS wish you all the best in your next endeavors, and forever

[ceph-users] Re: Stretch cluster experiences in production?

2021-10-19 Thread Clyso GmbH - Ceph Foundation Member
Hi Matthew, we are currently testing the feature and looking at the limits for EC support. Greetings, Joachim ___ Clyso GmbH - Ceph Foundation Member supp...@clyso.com https://www.clyso.com Am 15.10.2021 um 17:21 schrieb Matthew Vernon: Hi, Stretch

[ceph-users] 16.2.6 OSD Heartbeat Issues

2021-10-19 Thread Marco Pizzolo
Hi Everyone, For a new build we tested the 5.4 kernel which wasn't working well for us and ultimately changed to Ubuntu 20.04.3 HWE and 5.11 kernel. We can now get all OSDs more or less up, but on a clean OS reinstall we are seeing this type of behavior that is causing slow ops even before any

[ceph-users] Questions about tweaking ceph rebalancing activities

2021-10-19 Thread ceph-users
Hello all, I am in the progress of adding and removing a number of OSDs in my cluster and I'm running in to some issues where it would be good to be able to control the system a bit better. I've tried the documentation and google-fu but have come up short. This is the background/scenario: I

[ceph-users] Re: monitor not joining quorum

2021-10-19 Thread denispolom
Hi Adam, it's ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable) 19. 10. 2021 18:19:29 Adam King : > Hi Denis, > > Which ceph version is your cluster running on? I know there was an issue with > mons getting dropped from the monmap (and therefore being stuck out

[ceph-users] Re: monitor not joining quorum

2021-10-19 Thread Adam King
Hi Denis, Which ceph version is your cluster running on? I know there was an issue with mons getting dropped from the monmap (and therefore being stuck out of quorum) when their host was rebooted in Pacific version prior to 16.2.6 https://tracker.ceph.com/issues/51027. If you're on a Pacific

[ceph-users] Re: Stretch cluster experiences in production?

2021-10-19 Thread Matthew Vernon
Hi, On 18/10/2021 23:34, Gregory Farnum wrote: On Fri, Oct 15, 2021 at 8:22 AM Matthew Vernon wrote: Also, if I'm using RGWs, will they do the right thing location-wise? i.e. DC A RGWs will talk to DC A OSDs wherever possible? Stretch clusters are entirely a feature of the RADOS layer at

[ceph-users] monitor not joining quorum

2021-10-19 Thread Denis Polom
Hi, one of our monitor VM  was rebooted and not joining quorum again (quorum consist out of 3 monitors). While monitor service (ceph1) is running on this VM, Ceph cluster become unreachable. In monitor logs on ceph3 VM  I can see a lot of  following messages: 2021-10-19 17:50:19.555

[ceph-users] Multisite RGW - Secondary zone's data pool bigger than master

2021-10-19 Thread mhnx
Hello! I'm using Nautilus 14.2.16 with Multisite RGW setup. I have 2 zones. Working as Active-Passive(Read-Only) On master zone "ceph df" result is: POOLID PGS STORED OBJECTS USED%USED MAX AVAIL prod.rgw.buckets.index 54 128

[ceph-users] Re: towards a new ceph leadership team

2021-10-19 Thread Josh Durgin
On 10/15/21 08:53, Josh Durgin wrote: Hello folks, over the past few weeks the Ceph Leadership Team has been processing Sage's departure and figuring out how to run the project going forward. We all appreciate Sage's leadership over the past 17 years, and will dearly miss him. He did give us a

[ceph-users] Ceph Pacific (16.2.6) - Orphaned cache tier objects?

2021-10-19 Thread David Herselman
Hi Everyone, We appear to have a problem with ghost objects, most probably from when we were running Nautilus or even earlier. We have a few Ceph clusters in production (11) where a few (4) run with SSD cache tiers for HDD RBD pools. My Google-Fo appears to be failing me, none of the

[ceph-users] Re: Questions about tweaking ceph rebalancing activities

2021-10-19 Thread Jan-Philipp Litza
You are basically listing all the reasons one shouldn't have too much misplacement at once. ;-) Your best bet probably is pgremapper [1] that I've recently learned about on this list. With `cancel-backfill`, you could stop any running backfill. With `undo-upmaps` you could then specifically start

[ceph-users] create osd on spdk nvme device failed

2021-10-19 Thread lin sir
Hello everyone: When user vstart to create a spek test cluster,I got this error: 2021-10-19T06:38:51.245+ 7f5a83323340 -1 auth: unable to find a keyring on /root/ceph/build/dev/osd0/keyring: (2) No such file or directory 2021-10-19T06:38:51.245+ 7f5a83323340 -1 auth: unable to find a

[ceph-users] Re: Which verison of ceph is better

2021-10-19 Thread Martin Verges
Use pacific for new deployments. -- Martin Verges Managing director Mobile: +49 174 9335695 | Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io | YouTube:

[ceph-users] Re: Stretch cluster experiences in production?

2021-10-19 Thread Martin Verges
Hello Matthew, building strech clusters is not a big deal. It works quite well and stable as long as you have your network under control. This is the most error prone part of a stretch cluster but can easy be solved when you choose a good vendor and network gear. For 3 data centers make sure to