[ceph-users] Re: nautilus mgr die when the balancer runs

2022-12-13 Thread Boris
After some manual rebalancing, all PGs went into a clean state and I was abler to start the balancer again. ¯\_(ツ)_/¯ > Am 14.12.2022 um 01:18 schrieb Boris Behrens : > > Hi, > we had an issue with an old cluster, where we put disks from one host > to another. > We destroyed the disks and

[ceph-users] MTU Mismatch between ceph Daemons

2022-12-13 Thread Stolte, Felix
Hi guys, we had some issues with our cephfs last, which probably have been caused by a MTU mismatch (partly at least). Scenario was the following: OSD Servers: MTU 9000 on public and cluster network MON+MSD: MTU 1500 on public network CephFS Clients (Kernel Mout): MTU 9000 on public network RBD

[ceph-users] CFP: Everything Open 2023 (Melbourne, Australia, March 14-16)

2022-12-13 Thread Tim Serong
Everything Open is a new open tech conference auspiced by Linux Australia. For background see: https://everythingopen.au/news/introducing-everything-open/ For the CFP and other details, read on... Forwarded Message Subject: [Announce] Everything Open, All At Once! Date:

[ceph-users] Re: ceph-iscsi lock ping pong

2022-12-13 Thread Stolte, Felix
Issue is resolved now. After verifying that all esx hosts are configured for MRU, i took a closer look on the paths on each host. `gwcli` reported lun in question was owned by gateway A, but one esx host used the path to gateway B for I/O. I reconfigured that particular host and it’s now using

[ceph-users] Re: ceph-iscsi lock ping pong

2022-12-13 Thread Xiubo Li
On 14/12/2022 06:54, Joe Comeau wrote: I am curious about what is happening with your iscsi configuration Is this a new iscsi config or something that has just cropped up ? We are using/have been using vmware for 5+ years with iscsi We are using the kernel iscsi vs tcmu Do you mean

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-12-13 Thread Boris Behrens
You could try to do this in a screen session for a while. while true; do radosgw-admin gc process; done Maybe your normal RGW daemons are too busy for GC processing. We have this in our config and have started extra RGW instances for GC only: [global] ... # disable garbage collector default

[ceph-users] nautilus mgr die when the balancer runs

2022-12-13 Thread Boris Behrens
Hi, we had an issue with an old cluster, where we put disks from one host to another. We destroyed the disks and added them as new OSDs, but since then the mgr daemon were restarting in 120s intervals. I tried to debug it a bit, and it looks like the balancer is the problem. I tried to disable it

[ceph-users] Remove radosgw entirely

2022-12-13 Thread Fox, Kevin M
Is there any problem removing the radosgw and all backing pools from a cephadm managed cluster? Ceph won't become unhappy about it? We have one cluster with a really old, historical radosgw we think would be better to remove and someday later, recreate fresh. Thanks, Kevin

[ceph-users] Re: ceph-iscsi lock ping pong

2022-12-13 Thread Joe Comeau
I am curious about what is happening with your iscsi configuration Is this a new iscsi config or something that has just cropped up ? We are using/have been using vmware for 5+ years with iscsi We are using the kernel iscsi vs tcmu We are running ALUA and all datastores are setup as RR We

[ceph-users] Re: MDS_DAMAGE dir_frag

2022-12-13 Thread Sascha Lucas
Hi William, On Mon, 12 Dec 2022, William Edwards wrote: Op 12 dec. 2022 om 22:47 heeft Sascha Lucas het volgende geschreven: Ceph "servers" like MONs, OSDs, MDSs etc. are all 17.2.5/cephadm/podman. The filesystem kernel clients are co-located on the same hosts running the "servers".

[ceph-users] Re: mds stuck in standby, not one active

2022-12-13 Thread Patrick Donnelly
On Tue, Dec 13, 2022 at 2:21 PM Mevludin Blazevic wrote: > > Hi, > > thanks for the quick response! > > CEPH STATUS: > > cluster: > id: 8c774934-1535-11ec-973e-525400130e4f > health: HEALTH_ERR > 7 failed cephadm daemon(s) > There are daemons running an

[ceph-users] Re: What happens when a DB/WAL device runs out of space?

2022-12-13 Thread Vladimir Brik
> The DB uses "fixed" sizes like 3,30,300G for different levels of > data, and when it needs to start fill a new level and it doesn't fit, > this level moves over to the data device. I thought this no longer applied since the changes in Pacific that Nathan mentioned? Vlad On 12/13/22 12:46,

[ceph-users] Re: mds stuck in standby, not one active

2022-12-13 Thread Mevludin Blazevic
Hi, thanks for the quick response! CEPH STATUS: cluster:     id: 8c774934-1535-11ec-973e-525400130e4f     health: HEALTH_ERR     7 failed cephadm daemon(s)     There are daemons running an older version of ceph     1 filesystem is degraded     1 filesystem

[ceph-users] Re: mds stuck in standby, not one active

2022-12-13 Thread Patrick Donnelly
On Tue, Dec 13, 2022 at 2:02 PM Mevludin Blazevic wrote: > > Hi all, > > in Ceph Pacific 6.2.5, the MDS failover function does not working. The > one host with the active MDS hat to be rebooted and after that, the > standby deamons did not jump in. The fs was not accessible, instead all > mds

[ceph-users] mds stuck in standby, not one active

2022-12-13 Thread Mevludin Blazevic
Hi all, in Ceph Pacific 6.2.5, the MDS failover function does not working. The one host with the active MDS hat to be rebooted and after that, the standby deamons did not jump in. The fs was not accessible, instead all mds remain until now to standby. Also the cluster remains in Ceph Error

[ceph-users] Re: What happens when a DB/WAL device runs out of space?

2022-12-13 Thread Janne Johansson
Den tis 13 dec. 2022 kl 17:47 skrev Vladimir Brik : > > Hello > > I have a bunch of HDD OSDs with DB/WAL devices on SSD. If > the current trends continue, the DB/WAL devices will become > full before the HDDs completely fill up (e.g. a 50% full HDD > has DB/WAL device that is about 65% full). > >

[ceph-users] Announcing go-ceph v0.19.0

2022-12-13 Thread Anoop C S
We are happy to announce another release of the go-ceph API library. This is a regular release following our every-two-months release cadence. https://github.com/ceph/go-ceph/releases/tag/v0.19.0 More details are available at the link above. The library includes bindings that aim to play a

[ceph-users] What happens when a DB/WAL device runs out of space?

2022-12-13 Thread Vladimir Brik
Hello I have a bunch of HDD OSDs with DB/WAL devices on SSD. If the current trends continue, the DB/WAL devices will become full before the HDDs completely fill up (e.g. a 50% full HDD has DB/WAL device that is about 65% full). Will anything terrible happen when DB/WAL devices fill up?

[ceph-users] Re: pacific: ceph-mon services stopped after OSDs are out/down

2022-12-13 Thread Mevludin Blazevic
Its very strange. The keyring of the ceph monitor is the same as on one of the working monitor hosts. The failed mon and the working mons also have the same selinux policies and firewalld settings. The connection is also present since, all osd deamons are up on the failed ceph monitor node.

[ceph-users] Re: MDS_DAMAGE dir_frag

2022-12-13 Thread Sascha Lucas
Hi, On Mon, 12 Dec 2022, Sascha Lucas wrote: On Mon, 12 Dec 2022, Gregory Farnum wrote: Yes, we’d very much like to understand this. What versions of the server and kernel client are you using? What platform stack — I see it looks like you are using CephFS through the volumes interface? The

[ceph-users] Re: pacific: ceph-mon services stopped after OSDs are out/down

2022-12-13 Thread Eugen Block
Did you check the permissions? To me it reads like the permission denied errors prevent the MONs from starting and then as a result they are removed from the monmap: ceph-8c774934-1535-11ec-973e-525400130e4f-mon-sparci-store1[786211]: debug 2022-12-13T10:24:21.599+ 7f317ba4d700 -1

[ceph-users] Re: ceph-iscsi lock ping pong

2022-12-13 Thread Xiubo Li
On 13/12/2022 18:57, Stolte, Felix wrote: Hi Xiubo, Thx for pointing me into the right direction. All involved esx host seem to use the correct policy. I am going to detach the LUN on each host one by one until i found the host causing the problem. From the logs it means the client was

[ceph-users] Re: pacific: ceph-mon services stopped after OSDs are out/down

2022-12-13 Thread Mevludin Blazevic
The keyring is the same, but I found the following log lines: Dec 13 12:22:18 sparci-store1 ceph-8c774934-1535-11ec-973e-525400130e4f-mon-sparci-store1[813780]: debug 2022-12-13T11:22:18.016+ 7f789e7f3700  0 mon.sparci-store1@1(probing) e18  removed from monmap, suicide. Dec 13 12:22:18

[ceph-users] Re: ceph-iscsi lock ping pong

2022-12-13 Thread Stolte, Felix
Hi Xiubo, Thx for pointing me into the right direction. All involved esx host seem to use the correct policy. I am going to detach the LUN on each host one by one until i found the host causing the problem. Regards Felix

[ceph-users] Re: pacific: ceph-mon services stopped after OSDs are out/down

2022-12-13 Thread Eugen Block
So you get "Permission denied" errors, I'm guessing either the mon keyring is not present (or wrong) or the mon directory doesn't belong to the ceph user. Can you check ls -l /var/lib/ceph/FSID/mon.sparci-store1/ Compare the keyring file with the ones on the working mon nodes. Zitat von

[ceph-users] Re: pacific: ceph-mon services stopped after OSDs are out/down

2022-12-13 Thread Mevludin Blazevic
Hi Eugen, I assume the mon db is stored on the "OS disk". I could not find any error related lines in cephadm.log, here is what journalctl -xe tells me: Dec 13 11:24:21 sparci-store1 ceph-8c774934-1535-11ec-973e-525400130e4f-mon-sparci-store1[786211]: debug 2022-12-13T10:24:21.392+

[ceph-users] Re: Set async+rdma in Ceph cluster, then stuck

2022-12-13 Thread Mitsumasa KONDO
Hi Serkan, Thanks for your reply. -- Server setting -- OS: Ubuntu 20.04LTS NIC: Mellanox ConnectX-6 EN Driver: MLNX_OFED_LINUX-5.6-2.0.9.0-ubuntu20.04-x86_64 -- My ceph.conf is under following, -- ceph.conf -- # minimal ceph.conf for 2f383ac8-76cb-11ed-bfbc-6dd8bf17bdf9 [global] fsid

[ceph-users] Re: Migrate Individual Buckets

2022-12-13 Thread Janne Johansson
Den mån 12 dec. 2022 kl 21:18 skrev Benjamin.Zieglmeier : > We are in the process of building new stage (non-production) Ceph RGW > clusters hosting s3 buckets. We are looking to have our customers migrate > their non-production buckets to these new clusters. We want to help ease the >