[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-26 Thread Matthew Vernon
On 26/10/2020 14:13, Ing. Luis Felipe Domínguez Vega wrote: How can i free the store of ceph monitor?: root@fond-beagle:/var/lib/ceph/mon/ceph-fond-beagle# du -h -d1 542G    ./store.db 542G    .

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-26 Thread Ing . Luis Felipe Domínguez Vega
Exactly the cluster is recovering from a huge break, but i dont see any progress on "recovering", not show the progress of recovering -- cluster: id: 039bf268-b5a6-11e9-bbb7-d06726ca4a78 h

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-26 Thread 胡 玮文
> 在 2020年10月26日,23:29,Ing. Luis Felipe Domínguez Vega > 写道: > > mgr: fond-beagle(active, since 39s) Your manager seems crash looping, it only started since 39s. Looking at mgr logs may help you identify why your cluster is not recovering. You may hit some bug in mgr. _

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-26 Thread Ing . Luis Felipe Domínguez Vega
El 2020-10-26 12:23, 胡 玮文 escribió: 在 2020年10月26日,23:29,Ing. Luis Felipe Domínguez Vega 写道: mgr: fond-beagle(active, since 39s) Your manager seems crash looping, it only started since 39s. Looking at mgr logs may help you identify why your cluster is not recovering. You may hit some bug in m

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-26 Thread Eugen Block
You could stop the MGRs and wait for the recovery to finish, MGRs are not a critical component. You won’t have a dashboard or metrics during/of that time but it would prevent the high RAM usage. Zitat von "Ing. Luis Felipe Domínguez Vega" : El 2020-10-26 12:23, 胡 玮文 escribió: 在 2020年10月26日,

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-26 Thread Ing . Luis Felipe Domínguez Vega
El 2020-10-26 15:16, Eugen Block escribió: You could stop the MGRs and wait for the recovery to finish, MGRs are not a critical component. You won’t have a dashboard or metrics during/of that time but it would prevent the high RAM usage. Zitat von "Ing. Luis Felipe Domínguez Vega" : El 2020-10

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-26 Thread Eugen Block
The recovery process (ceph -s) is independent of the MGR service but only depends on the MON service. It seems you only have the one MON, if the MGR is overloading it (not clear why) it could help to leave MGR off and see if the MON service then has enough RAM to proceed with the recovery.

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-26 Thread Ing . Luis Felipe Domínguez Vega
I was 3 mons, but i have 2 physical datacenters, one of them breaks with not short term fix, so i remove all osds and ceph mon (2 of them) and now i have only the osds of 1 datacenter with the monitor. I was stopped the ceph manager, but i was see that when i restart a ceph manager then ceph -s

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-26 Thread Ing . Luis Felipe Domínguez Vega
The ceph mon logs... many of this unstoppable on my log: -- 2020-10-26T15:40:28.875729-0400 osd.23 [WRN] slow request osd_op(client.86168166.0:9023356 5.56 5.1cd5a6d6 (undecoded) ondisk+retry+write+known_if_redirected e159644) initiated 2020-

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-27 Thread Eugen Block
Hi, just to clarify so I don't miss anything: you have two DCs and one of them is down. And two of the MONs were in that failed DC? Now you removed all OSDs and two MONs from the failed DC hoping that your cluster will recover? If you have reasonable crush rules in place (e.g. to recover

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-27 Thread Ing . Luis Felipe Domínguez Vega
I understand, but i delete the OSDs from CRUSH map, so ceph don't wait for these OSDs, i'm right? El 2020-10-27 04:06, Eugen Block escribió: Hi, just to clarify so I don't miss anything: you have two DCs and one of them is down. And two of the MONs were in that failed DC? Now you removed all O

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-27 Thread Eugen Block
I understand, but i delete the OSDs from CRUSH map, so ceph don't wait for these OSDs, i'm right? It depends on your actual crush tree and rules. Can you share (maybe you already did) ceph osd tree ceph osd df ceph osd pool ls detail and a dump of your crush rules? As I already said, if y

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-27 Thread Ing . Luis Felipe Domínguez Vega
Needed data: ceph -s : https://pastebin.ubuntu.com/p/S9gKjyZtdK/ ceph osd tree : https://pastebin.ubuntu.com/p/SCZHkk6Mk4/ ceph osd df : (later, because i'm waiting since 10 minutes and not output yet) ceph osd pool ls detail : https://pastebin.ubuntu.com/p

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-27 Thread Eugen Block
Your pool 'data_storage' has a size of 7 (or 7 chunks since it's erasure-coded) and the rule requires each chunk on a different host but you currently have only 5 hosts available, that's why the recovery is not progressing. It's waiting for two more hosts. Unfortunately, you can't change th

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-27 Thread Ing . Luis Felipe Domínguez Vega
Well. 7 hosts up and recovery start and stop in 3 hours more or less, now the cluster is not recovering any more... can be that needs more hosts? El 2020-10-27 13:58, Eugen Block escribió: Hm, that would be new to me that the mgr service is required for recovery, but maybe I missed something a

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-27 Thread Ing . Luis Felipe Domínguez Vega
Well recovering not working yet... i was started 6 servers more and the cluster not yet recovered. Ceph status not show any recover progress ceph -s : https://pastebin.ubuntu.com/p/zRQPbvGzbw/ ceph osd tree : https://pastebin.ubuntu.com/p/sTDs8vd7Sk/ ceph osd df

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-27 Thread Eugen Block
If you have that many spare hosts I would recommend to deploy two more MONs on them, and probably also additional MGRs so they can failover. What is the EC profile for the data_storage pool? Can you also share ceph pg dump pgs | grep -v "active+clean" to see which PGs are affected. The remai

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-28 Thread Ing . Luis Felipe Domínguez Vega
Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Ing. Luis Felipe Domínguez Vega Sent: 28 October 2020 05:14:27 To: Eugen Block Cc: Ceph Users Subject: [ceph-users] Re: Huge HDD ceph monitor usage [EXT] Well recovering not working yet... i was sta

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-28 Thread Ing . Luis Felipe Domínguez Vega
EC profile: https://pastebin.ubuntu.com/p/kjbdQXbs85/ ceph pg dump pgs | grep -v "active+clean": https://pastebin.ubuntu.com/p/g6TdZXNXBR/ El 2020-10-28 02:23, Eugen Block escribió: If you have that many spare hosts I would recommend to deploy two more MONs on them, and probably also addition

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-28 Thread Frank Schilder
z Vega Sent: 28 October 2020 05:14:27 To: Eugen Block Cc: Ceph Users Subject: [ceph-users] Re: Huge HDD ceph monitor usage [EXT] Well recovering not working yet... i was started 6 servers more and the cluster not yet recovered. Ceph status not show any recover progress ceph -s

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-28 Thread Eugen Block
28 October 2020 05:14:27 To: Eugen Block Cc: Ceph Users Subject: [ceph-users] Re: Huge HDD ceph monitor usage [EXT] Well recovering not working yet... i was started 6 servers more and the cluster not yet recovered. Ceph status not show any recover progress ceph -s : https://past

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-29 Thread Ing . Luis Felipe Domínguez Vega
m S14 From: Eugen Block Sent: 28 October 2020 07:23:09 To: Ing. Luis Felipe Domínguez Vega Cc: Ceph Users Subject: [ceph-users] Re: Huge HDD ceph monitor usage [EXT] If you have that many spare hosts I would recommend to deploy two more MONs on them, and probably also additional MG

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-29 Thread Ing . Luis Felipe Domínguez Vega
From: Eugen Block Sent: 28 October 2020 07:23:09 To: Ing. Luis Felipe Domínguez Vega Cc: Ceph Users Subject: [ceph-users] Re: Huge HDD ceph monitor usage [EXT] If you have that many spare hosts I would recommend to deploy two more MONs on them, and probably also additional MGRs so they can fai

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-29 Thread Frank Schilder
h-users] Re: Huge HDD ceph monitor usage [EXT] If you have that many spare hosts I would recommend to deploy two more MONs on them, and probably also additional MGRs so they can failover. What is the EC profile for the data_storage pool? Can you also share ceph pg dump pgs | grep -v "active+clea