[ceph-users] PG has no primary osd

2021-07-13 Thread Andres Rojas Guerrero
Hi, recently in a Nautilus cluster version 14.2.6 I have changed the rule crush map to host type instead osd, all seems Ok, but now I have "PG not deep-scrubbed in time" with all pgs in active+clean state: # ceph status cluster: id: c74da5b8-3d1b-483e-8b3a-739134db6cf8 health:

[ceph-users] Re: MDS cache tunning

2021-06-02 Thread Andres Rojas Guerrero
Hi, after one week with only one a MDS all the errors have vanished and the cluster it's running smoothly! Thank you very much for the help!! El 27/5/21 a las 9:50, Andres Rojas Guerrero escribió: Thank you very much, very good explanation!! El 27/5/21 a las 9:42, Dan van der Ster esc

[ceph-users] Re: MDS cache tunning

2021-05-27 Thread Andres Rojas Guerrero
Thank you very much, very good explanation!! El 27/5/21 a las 9:42, Dan van der Ster escribió: etween 100-200 -- *** Andrés Rojas Guerrero Unidad Sistemas Linux Area Arquitectura Tecnológica Secretaría General Adjunta de Informática Consejo S

[ceph-users] Re: MDS cache tunning

2021-05-27 Thread Andres Rojas Guerrero
Oh, very interesting!! I have reduced the number of MDS to one. Only one question more, out of curiosity, from what number can we consider that there are many clients? El 27/5/21 a las 9:24, Dan van der Ster escribió: On Thu, May 27, 2021 at 9:21 AM Andres Rojas Guerrero wrote: El 26

[ceph-users] Re: MDS cache tunning

2021-05-27 Thread Andres Rojas Guerrero
El 26/5/21 a las 16:51, Dan van der Ster escribió: I see you have two active MDSs. Is your cluster more stable if you use only one single active MDS? Good question!! I read form Ceph Doc: "You should configure multiple active MDS daemons when your metadata performance is bottlenecked on the

[ceph-users] Re: MDS cache tunning

2021-05-26 Thread Andres Rojas Guerrero
al, and doesn't necessarily indicate any problem. It *would* be a problem if the MDS memory grows uncontrollably, however. Otherwise, check those new defaults for caps recall -- they were released around 14.2.19 IIRC. -- Dan On Wed, May 26, 2021 at 12:46 PM Andres Rojas Guerrero wrote: Thanks

[ceph-users] Re: MDS cache tunning

2021-05-26 Thread Andres Rojas Guerrero
e any reason you want to start adjusting these params? Best Regards, Dan [1] https://github.com/ceph/ceph/pull/38574 On Wed, May 26, 2021 at 11:58 AM Andres Rojas Guerrero wrote: Hi all, I have observed that the MDS Cache Configuration has 18 parameters: mds_cache_memory_

[ceph-users] MDS cache tunning

2021-05-26 Thread Andres Rojas Guerrero
Hi all, I have observed that the MDS Cache Configuration has 18 parameters: mds_cache_memory_limit mds_cache_reservation mds_health_cache_threshold mds_cache_trim_threshold mds_cache_trim_decay_rate mds_recall_max_caps mds_recall_max_decay_threshold mds_recall_max_decay_rate mds_recall_global_max

[ceph-users] MDS process large memory consumption

2021-05-19 Thread Andres Rojas Guerrero
Hi all, I have observed that in a Nautilus (14.2.6) cluster the mds process in the MDS server is consuming a large amount of memory, for example in a MDS server with 128 GB of RAM I have observed the mds process it's consuming ~ 80 GB: ceph 20 0 78,8g 77,1g 13772 S 4,0 61,5 28:37

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Andres Rojas Guerrero
Clyso GmbH - Ceph Foundation Member supp...@clyso.com https://www.clyso.com Am 06.05.2021 um 15:21 schrieb Andres Rojas Guerrero: Yes, my ceph version is Nautilus: # ceph -v ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus (stable) First dump the crush map:

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Andres Rojas Guerrero
https://www.clyso.com Am 06.05.2021 um 15:21 schrieb Andres Rojas Guerrero: Yes, my ceph version is Nautilus: # ceph -v ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus (stable) First dump the crush map: # ceph osd getcrushmap -o crush_map Then, decompile the crus

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Andres Rojas Guerrero
/5/21 a las 14:13, Eugen Block escribió: Interesting, I haven't had that yet with crushtool. Your ceph version is Nautilus, right? And you did decompile the binary crushmap with crushtool, correct? I don't know how to reproduce that. Zitat von Andres Rojas Guerrero : I have this erro

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Andres Rojas Guerrero
inst host failure you'll have to go > through that at some point. > > > https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/ > > > Zitat von Andres Rojas Guerrero : > >> Hi, I try to make a new crush rule (Nautilus) in order take the new

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Andres Rojas Guerrero
emporarily) unavailable PGs. But to > make your cluster resilient against host failure you'll have to go > through that at some point. > > > https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/ > > > Zitat von Andres Rojas Guerrero : &g

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Andres Rojas Guerrero
pond to capability release; 2 MDSs report slow metadata IOs; 1 MDSs report slow requests MDS_CLIENT_LATE_RELEASE 11 clients failing to respond to capability release mdsceph2mon03(mds.1): Client nxtcl3: failing to respond to capability release client_id: 1524269 mdsceph2mon01(mds.0): Client nxtcl

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-05 Thread Andres Rojas Guerrero
Thanks, I will test it. El 5/5/21 a las 16:37, Joachim Kraftmayer escribió: Create a new crush rule with the correct failure domain, test it properly and assign it to the pool(s). -- *** Andrés Rojas Guerrero Unidad Sistemas Linux Area Arqu

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-05 Thread Andres Rojas Guerrero
Nice observation, how can avoid this problem? El 5/5/21 a las 14:54, Robert Sander escribió: Hi, Am 05.05.21 um 13:39 schrieb Joachim Kraftmayer: the crush rule with ID 1 distributes your EC chunks over the osds without considering the ceph host. As Robert already suspected. Yes, the "nxt

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-05 Thread Andres Rojas Guerrero
# ceph osd crush rule dump [ { "rule_id": 0, "rule_name": "replicated_rule", "ruleset": 0, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -1, "item_n

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-05 Thread Andres Rojas Guerrero
fy the crush map? El 5/5/21 a las 11:55, Robert Sander escribió: > Hi, > > Am 05.05.21 um 11:44 schrieb Andres Rojas Guerrero: >> I have in the cluster 768 OSD, it is enough that 32 (~ 4%) of them (in >> the same node) fall and the information becomes inaccessible. Is it

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-05 Thread Andres Rojas Guerrero
redundancy: 7384199/222506256 objects degraded (3.319%), 2925 pgs degraded, 2925 pgs undersized El 5/5/21 a las 11:20, Andres Rojas Guerrero escribió: > They are located on a single node ... > > El 5/5/21 a las 11:17, Burkhard Linke escribió: >> Hi, >> >> On 05.05.21 1

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-05 Thread Andres Rojas Guerrero
They are located on a single node ... El 5/5/21 a las 11:17, Burkhard Linke escribió: > Hi, > > On 05.05.21 11:07, Andres Rojas Guerrero wrote: >> Sorry, I have not understood the problem well, the problem I see is that >> once the OSD fails, the cluster recovers but t

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-05 Thread Andres Rojas Guerrero
I see that problem, when the osds fail the mds fail, with errors with type "slow metadata, slow requests" but do not recover once the cluster has recovered ... Why? El 5/5/21 a las 11:07, Andres Rojas Guerrero escribió: > Sorry, I have not understood the problem well, the pro

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-05 Thread Andres Rojas Guerrero
18down El 5/5/21 a las 11:00, Andres Rojas Guerrero escribió: > Yes, the principal problem is the MDS start to report slowly and the > information is no longer accessible, and the cluster never recover. > > > # ceph status > cluster: > id: c7

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-05 Thread Andres Rojas Guerrero
'ceph > health detail' could be useful. > > On 05/05 10:48, Andres Rojas Guerrero wrote: >> Hi, I have a Nautilus cluster version 14.2.6 , and I have noted that >> when some OSD go down the cluster doesn't start recover. I have checked >> that the option

[ceph-users] Ceph cluster not recover after OSD down

2021-05-05 Thread Andres Rojas Guerrero
Hi, I have a Nautilus cluster version 14.2.6 , and I have noted that when some OSD go down the cluster doesn't start recover. I have checked that the option noout is unset. What could be the reason for this behavior? -- *** Andrés Rojas Guerr

[ceph-users] Re: Unable to increase PG numbers

2020-02-25 Thread Andres Rojas Guerrero
to:w...@wesdillingham.com> > LinkedIn > <http://www.linkedin.com/in/wesleydillingham> > > > On Mon, Feb 24, 2020 at 9:44 AM Andres Rojas Guerrero <mailto:a.ro...@csic.es>> wrote: > > I have tried to increase to 16, with the same result: >

[ceph-users] Re: Unable to increase PG numbers

2020-02-25 Thread Andres Rojas Guerrero
That's right!! I will try to update, but now I have the desire PG numbers. Thank you. El 25/2/20 a las 15:01, Wesley Dillingham escribió: > ceph osd require-osd-release nautilus ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an ema

[ceph-users] Re: Unable to increase PG numbers

2020-02-24 Thread Andres Rojas Guerrero
I have tried to increase to 16, with the same result: # ceph osd pool set cephfs_data pg_num 16 set pool 1 pg_num to 16 # ceph osd pool get cephfs_data pg_num pg_num: 8 El 24/2/20 a las 15:10, Gabryel Mason-Williams escribió: > Have you tried making a smaller increment instead of jumping from 8

[ceph-users] Unable to increase PG numbers

2020-02-24 Thread Andres Rojas Guerrero
Hi, I have a Nautilus installation version 14.2.1 with a very unbalanced cephfs pool, I have 430 osd in the cluster but this pool only have 8 PG and PGP and 118 TB used : # ceph -s cluster: id: a2269da7-e399-484a-b6ae-4ee1a31a4154 health: HEALTH_WARN 1 nearfull osd(s)