[ceph-users] Re: Leader election, how to notice it?

2021-10-03 Thread Konstantin Shalygin
Hi, You always can get the leader from quorum status ceph quorum_status | jq -r '.quorum_leader_name' Cheers, k > On 3 Oct 2021, at 10:21, gustavo panizzo wrote: > > Instead of setting up pacemaker or similar I'd like to only run the > application in the same machine > as the leader Mon. At

[ceph-users] Re: Leader election loop reappears

2021-10-03 Thread Manuel Holtgrewe
After still more digging, I found the following high numbers of failed connection attempts on my osd nodes, see bottom the netstat output (nstat is also useful as it allows to reset the counters). The failed connection attempts could be too high. I found an old thread on the mailing list that recom

[ceph-users] Leader election, how to notice it?

2021-10-03 Thread gustavo panizzo
hello I have an application that talks to ceph's control plane, I can only run a single copy of it.  Instead of setting up pacemaker or similar I'd like to only run the application in the same machine as the leader Mon. At start of the application I can detect who's the leader, but if the leade

[ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input

2021-10-03 Thread Stefan Kooman
On 10/1/21 14:07, von Hoesslin, Volker wrote: is there any chance to fix this? there are some "advanced metadata repair tools" (https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/ ) but i'm not realy sure is it th

[ceph-users] Re: How to get ceph bug 'non-errors' off the dashboard?

2021-10-03 Thread Harry G. Coin
Worked very well!  Thank you. Harry Coin On 10/2/21 11:23 PM, 胡 玮文 wrote: Hi Harry, Please try these commands in CLI: ceph health mute MGR_MODULE_ERROR ceph health mute CEPHADM_CHECK_NETWORK_MISSING Weiwen Hu 在 2021年10月3日,05:37,Harry G. Coin 写道: I need help getting two 'non errors' of

[ceph-users] ceph-objectstore-tool core dump

2021-10-03 Thread Michael Thomas
I recently started getting inconsistent PGs in my Octopus (15.2.14) ceph cluster. I was able to determine that they are all coming from the same OSD: osd.143. This host recently suffered from an unplanned power loss, so I'm not surprised that there may be some corruption. This PG is part of

[ceph-users] Re: ceph-objectstore-tool core dump

2021-10-03 Thread 胡 玮文
> 在 2021年10月4日,00:53,Michael Thomas 写道: > > I recently started getting inconsistent PGs in my Octopus (15.2.14) ceph > cluster. I was able to determine that they are all coming from the same OSD: > osd.143. This host recently suffered from an unplanned power loss, so I'm > not surprised th

[ceph-users] 回复: Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-03 Thread 胡 玮文
The stack trace (tcmalloc::allocate_full_cpp_throw_oom) seems indicating you don’t have enough memory. 发件人: Szabo, Istvan (Agoda) 发送时间: 2021年10月4日 0:46 收件人: Igor Fedotov 抄送: ceph-users@ceph.io 主题: [ceph-users] Re:

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-03 Thread 胡 玮文
Sorry, I read it again and found “tcmalloc: large alloc 94477368950784 bytes == (nil)”. This unrealistic large malloc seems indicating a bug. But I didn’t find one in the tracker. 发件人: Szabo, Istvan (Agoda) 发送时间: Monday, October 4, 2021 12:45:20 AM 收件人: Igor Fed

[ceph-users] Re: ceph-objectstore-tool core dump

2021-10-03 Thread Michael Thomas
On 10/3/21 12:08, 胡 玮文 wrote: 在 2021年10月4日,00:53,Michael Thomas 写道: I recently started getting inconsistent PGs in my Octopus (15.2.14) ceph cluster. I was able to determine that they are all coming from the same OSD: osd.143. This host recently suffered from an unplanned power loss, so

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-03 Thread Szabo, Istvan (Agoda)
Seems like it cannot start anymore once migrated ☹ https://justpaste.it/5hkot Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com -

[ceph-users] Re: ceph-objectstore-tool core dump

2021-10-03 Thread Dave Hall
Hello, I have also recently dealt with a couple inconsistent PGs - EC 8+2 on 12TB HDDs. In one case, 'ceph pg repair' was able to clear the issue. In a second case it would not do so without other intervention. As I found documented, I used 'ceph health detail' to locate the problem PG, and the

[ceph-users] Re: ceph-objectstore-tool core dump

2021-10-03 Thread 胡 玮文
> 在 2021年10月4日,04:18,Michael Thomas 写道: > > On 10/3/21 12:08, 胡 玮文 wrote: 在 2021年10月4日,00:53,Michael Thomas 写道: >>> >>> I recently started getting inconsistent PGs in my Octopus (15.2.14) ceph >>> cluster. I was able to determine that they are all coming from the same >>> OSD: osd.143