Hi, recently in a Nautilus cluster version 14.2.6 I have changed the
rule crush map to host type instead osd, all seems Ok, but now I have
"PG not deep-scrubbed in time" with all pgs in active+clean state:
# ceph status
cluster:
id: c74da5b8-3d1b-483e-8b3a-739134db6cf8
health:
Hi, after one week with only one a MDS all the errors have vanished and
the cluster it's running smoothly!
Thank you very much for the help!!
El 27/5/21 a las 9:50, Andres Rojas Guerrero escribió:
Thank you very much, very good explanation!!
El 27/5/21 a las 9:42, Dan van der Ster esc
Thank you very much, very good explanation!!
El 27/5/21 a las 9:42, Dan van der Ster escribió:
etween 100-200
--
***
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo S
Oh, very interesting!! I have reduced the number of MDS to one. Only one
question more, out of curiosity, from what number can we consider that
there are many clients?
El 27/5/21 a las 9:24, Dan van der Ster escribió:
On Thu, May 27, 2021 at 9:21 AM Andres Rojas Guerrero wrote:
El 26
El 26/5/21 a las 16:51, Dan van der Ster escribió:
I see you have two active MDSs. Is your cluster more stable if you use
only one single active MDS?
Good question!! I read form Ceph Doc:
"You should configure multiple active MDS daemons when your metadata
performance is bottlenecked on the
al, and doesn't necessarily indicate
any problem.
It *would* be a problem if the MDS memory grows uncontrollably, however.
Otherwise, check those new defaults for caps recall -- they were
released around 14.2.19 IIRC.
-- Dan
On Wed, May 26, 2021 at 12:46 PM Andres Rojas Guerrero wrote:
Thanks
e any reason you want to start adjusting these params?
Best Regards,
Dan
[1] https://github.com/ceph/ceph/pull/38574
On Wed, May 26, 2021 at 11:58 AM Andres Rojas Guerrero wrote:
Hi all, I have observed that the MDS Cache Configuration has 18 parameters:
mds_cache_memory_
Hi all, I have observed that the MDS Cache Configuration has 18 parameters:
mds_cache_memory_limit
mds_cache_reservation
mds_health_cache_threshold
mds_cache_trim_threshold
mds_cache_trim_decay_rate
mds_recall_max_caps
mds_recall_max_decay_threshold
mds_recall_max_decay_rate
mds_recall_global_max
Hi all, I have observed that in a Nautilus (14.2.6) cluster the mds
process in the MDS server is consuming a large amount of memory, for
example in a MDS server with 128 GB of RAM I have observed the mds
process it's consuming ~ 80 GB:
ceph 20 0 78,8g 77,1g 13772 S 4,0 61,5 28:37
Clyso GmbH - Ceph Foundation Member
supp...@clyso.com
https://www.clyso.com
Am 06.05.2021 um 15:21 schrieb Andres Rojas Guerrero:
Yes, my ceph version is Nautilus:
# ceph -v
ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9)
nautilus (stable)
First dump the crush map:
https://www.clyso.com
Am 06.05.2021 um 15:21 schrieb Andres Rojas Guerrero:
Yes, my ceph version is Nautilus:
# ceph -v
ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9)
nautilus (stable)
First dump the crush map:
# ceph osd getcrushmap -o crush_map
Then, decompile the crus
/5/21 a las 14:13, Eugen Block escribió:
Interesting, I haven't had that yet with crushtool. Your ceph version is
Nautilus, right? And you did decompile the binary crushmap with
crushtool, correct? I don't know how to reproduce that.
Zitat von Andres Rojas Guerrero :
I have this erro
inst host failure you'll have to go
> through that at some point.
>
>
> https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/
>
>
> Zitat von Andres Rojas Guerrero :
>
>> Hi, I try to make a new crush rule (Nautilus) in order take the new
emporarily) unavailable PGs. But to
> make your cluster resilient against host failure you'll have to go
> through that at some point.
>
>
> https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/
>
>
> Zitat von Andres Rojas Guerrero :
&g
pond to capability release; 2 MDSs
report slow metadata IOs; 1 MDSs report slow requests
MDS_CLIENT_LATE_RELEASE 11 clients failing to respond to capability release
mdsceph2mon03(mds.1): Client nxtcl3: failing to respond to
capability release client_id: 1524269
mdsceph2mon01(mds.0): Client nxtcl
Thanks, I will test it.
El 5/5/21 a las 16:37, Joachim Kraftmayer escribió:
Create a new crush rule with the correct failure domain, test it
properly and assign it to the pool(s).
--
***
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arqu
Nice observation, how can avoid this problem?
El 5/5/21 a las 14:54, Robert Sander escribió:
Hi,
Am 05.05.21 um 13:39 schrieb Joachim Kraftmayer:
the crush rule with ID 1 distributes your EC chunks over the osds
without considering the ceph host. As Robert already suspected.
Yes, the "nxt
# ceph osd crush rule dump
[
{
"rule_id": 0,
"rule_name": "replicated_rule",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_n
fy the crush map?
El 5/5/21 a las 11:55, Robert Sander escribió:
> Hi,
>
> Am 05.05.21 um 11:44 schrieb Andres Rojas Guerrero:
>> I have in the cluster 768 OSD, it is enough that 32 (~ 4%) of them (in
>> the same node) fall and the information becomes inaccessible. Is it
redundancy: 7384199/222506256 objects degraded
(3.319%), 2925 pgs degraded, 2925 pgs undersized
El 5/5/21 a las 11:20, Andres Rojas Guerrero escribió:
> They are located on a single node ...
>
> El 5/5/21 a las 11:17, Burkhard Linke escribió:
>> Hi,
>>
>> On 05.05.21 1
They are located on a single node ...
El 5/5/21 a las 11:17, Burkhard Linke escribió:
> Hi,
>
> On 05.05.21 11:07, Andres Rojas Guerrero wrote:
>> Sorry, I have not understood the problem well, the problem I see is that
>> once the OSD fails, the cluster recovers but t
I see that problem, when the osds fail the mds fail, with errors with
type "slow metadata, slow requests" but do not recover once the cluster
has recovered ... Why?
El 5/5/21 a las 11:07, Andres Rojas Guerrero escribió:
> Sorry, I have not understood the problem well, the pro
18down
El 5/5/21 a las 11:00, Andres Rojas Guerrero escribió:
> Yes, the principal problem is the MDS start to report slowly and the
> information is no longer accessible, and the cluster never recover.
>
>
> # ceph status
> cluster:
> id: c7
'ceph
> health detail' could be useful.
>
> On 05/05 10:48, Andres Rojas Guerrero wrote:
>> Hi, I have a Nautilus cluster version 14.2.6 , and I have noted that
>> when some OSD go down the cluster doesn't start recover. I have checked
>> that the option
Hi, I have a Nautilus cluster version 14.2.6 , and I have noted that
when some OSD go down the cluster doesn't start recover. I have checked
that the option noout is unset.
What could be the reason for this behavior?
--
***
Andrés Rojas Guerr
to:w...@wesdillingham.com>
> LinkedIn
> <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Mon, Feb 24, 2020 at 9:44 AM Andres Rojas Guerrero <mailto:a.ro...@csic.es>> wrote:
>
> I have tried to increase to 16, with the same result:
>
That's right!! I will try to update, but now I have the desire PG numbers.
Thank you.
El 25/2/20 a las 15:01, Wesley Dillingham escribió:
> ceph osd require-osd-release nautilus
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an ema
I have tried to increase to 16, with the same result:
# ceph osd pool set cephfs_data pg_num 16
set pool 1 pg_num to 16
# ceph osd pool get cephfs_data pg_num
pg_num: 8
El 24/2/20 a las 15:10, Gabryel Mason-Williams escribió:
> Have you tried making a smaller increment instead of jumping from 8
Hi, I have a Nautilus installation version 14.2.1 with a very unbalanced
cephfs pool, I have 430 osd in the cluster but this pool only have 8 PG
and PGP and 118 TB used :
# ceph -s
cluster:
id: a2269da7-e399-484a-b6ae-4ee1a31a4154
health: HEALTH_WARN
1 nearfull osd(s)
29 matches
Mail list logo