[ceph-users] pgs stuck backfill_toofull

2020-10-28 Thread Mark Johnson
I've been struggling with this one for a few days now. We had an OSD report as near full a few days ago. Had this happen a couple of times before and a reweight-by-utilization has sorted it out in the past. Tried the same again but this time we ended up with a couple of pgs in a state of

[ceph-users] Re: frequent Monitor down

2020-10-28 Thread Andrei Mikhailovsky
Eugen, I've got four physical servers and I've installed mon on all of them. I've discussed it with Wido and a few other chaps from ceph and there is no issue in doing it. The quorum issues would happen if you have 2 mons. If you've got more than 2 you should be fine. Andrei - Original

[ceph-users] Monitor persistently out-of-quorum

2020-10-28 Thread Ki Wong
Hello, I am at my wit's end. So I made a mistake in the configuration of my router and one of the monitors (out of 3) dropped out of the quorum and nothing I’ve done allow it to rejoin. That includes reinstalling the monitor with ceph-ansible. The connectivity issue is fixed. I’ve tested it

[ceph-users] Re: frequent Monitor down

2020-10-28 Thread Andrei Mikhailovsky
Yes, I have, Eugen, I see no obvious reason / error / etc. I see a lot of entries relating to Compressing as well as monitor going down. Andrei - Original Message - > From: "Eugen Block" > To: "ceph-users" > Sent: Wednesday, 28 October, 2020 11:51:20 > Subject: [ceph-users] Re:

[ceph-users] Re: Ceph User Survey 2020 - Working Group Invite

2020-10-28 Thread Mike Perez
Hi all, please join here: https://meet.google.com/_meet/ush-zpjg-wab?ijlm=1603919117370=130 On Fri, Oct 9, 2020 at 10:25 AM wrote: > Hello all, > > This is an invite to all interested to join a working group being formed > for 2020 Ceph User Survey planning. The focus is to augment the >

[ceph-users] Re: frequent Monitor down

2020-10-28 Thread Eugen Block
Why do you have 4 MONs in the first place? That way a quorum is difficult to achieve, could it be related to that? Zitat von Andrei Mikhailovsky : Yes, I have, Eugen, I see no obvious reason / error / etc. I see a lot of entries relating to Compressing as well as monitor going down.

[ceph-users] frequent Monitor down

2020-10-28 Thread Andrei Mikhailovsky
Hello everyone, I am having regular messages that the Monitors are going down and up: 2020-10-27T09:50:49.032431+ mon .arh-ibstorage2-ib ( mon .1) 2248 : cluster [WRN] Health check failed: 1/4 mons down, quorum arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN)

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-28 Thread Eugen Block
You have many unknown PGs because you removed lots of OSDs, this is likely to be a problem. Are the removed OSDs still at hand? It's possible that you could extract PGs which are missing and import them on healthy OSDs, but that's a lot of manual work. Do you have backups of the data? Then

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-28 Thread Frank Schilder
Hi all, I need to go back to a small piece of information: > I was 3 mons, but i have 2 physical datacenters, one of them breaks with > not short term fix, so i remove all osds and ceph mon (2 of them) and > now i have only the osds of 1 datacenter with the monitor. When I look at the data about

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-28 Thread Ing . Luis Felipe Domínguez Vega
EC profile: https://pastebin.ubuntu.com/p/kjbdQXbs85/ ceph pg dump pgs | grep -v "active+clean": https://pastebin.ubuntu.com/p/g6TdZXNXBR/ El 2020-10-28 02:23, Eugen Block escribió: If you have that many spare hosts I would recommend to deploy two more MONs on them, and probably also

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-28 Thread Ing . Luis Felipe Domínguez Vega
Great response, thanks, i will use now only one site, but need first stabilice the cluster to remove the EC erasure coding and use replicate. Could you help me? So the thing is that i have 2 pool, cinder-ceph and data_storage. data_storage is only as data_path for cinder-ceph pool, but now i

[ceph-users] Re: OSD down, how to reconstruct it from its main and block.db parts ?

2020-10-28 Thread David Caro
Hi Wladimir, according to the logs you first sent it seems that there is an authentication issue (the osd daemon not being able to fetch the mon config): > жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300 > 7f513cebedc0 -1 AuthRegistry(0x7fff46ea5d80) no keyring found at >

[ceph-users] Re: frequent Monitor down

2020-10-28 Thread Eugen Block
Have you looked into syslog and mon logs? Zitat von Andrei Mikhailovsky : Hello everyone, I am having regular messages that the Monitors are going down and up: 2020-10-27T09:50:49.032431+ mon .arh-ibstorage2-ib ( mon .1) 2248 : cluster [WRN] Health check failed: 1/4 mons down, quorum

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-28 Thread Eugen Block
If you have that many spare hosts I would recommend to deploy two more MONs on them, and probably also additional MGRs so they can failover. What is the EC profile for the data_storage pool? Can you also share ceph pg dump pgs | grep -v "active+clean" to see which PGs are affected. The