But, it another cluster with version 14.2.16, it's working ... it's
seems a problem of the version 14.2.6 ...?
El 6/5/21 a las 18:28, Clyso GmbH - Ceph Foundation Member escribió:
Hi Andres,
does the commando work with the original rule/crushmap?
___
Clyso Gmb
No, it doesn't work with an unedit crush map file.
El 6/5/21 a las 18:28, Clyso GmbH - Ceph Foundation Member escribió:
Hi Andres,
does the commando work with the original rule/crushmap?
___
Clyso GmbH - Ceph Foundation Member
supp...@clyso.com
https://www.cl
Hi Andres,
does the commando work with the original rule/crushmap?
___
Clyso GmbH - Ceph Foundation Member
supp...@clyso.com
https://www.clyso.com
Am 06.05.2021 um 15:21 schrieb Andres Rojas Guerrero:
Yes, my ceph version is Nautilus:
# ceph -v
ceph version 14
Yes, my ceph version is Nautilus:
# ceph -v
ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus
(stable)
First dump the crush map:
# ceph osd getcrushmap -o crush_map
Then, decompile the crush map:
# crushtool -d crush_map -o crush_map_d
Now, edit the crush rule and co
Interesting, I haven't had that yet with crushtool. Your ceph version
is Nautilus, right? And you did decompile the binary crushmap with
crushtool, correct? I don't know how to reproduce that.
Zitat von Andres Rojas Guerrero :
I have this error when try to show mappings with crushtool:
# c
I have this error when try to show mappings with crushtool:
# crushtool -i crush_map_new --test --rule 2 --num-rep 7 --show-mappings
CRUSH rule 2 x 0 [-5,-45,-49,-47,-43,-41,-29]
*** Caught signal (Segmentation fault) **
in thread 7f7f7a0ccb40 thread_name:crushtool
El 6/5/21 a las 13:47, Euge
Ok, thank you very much for the answer.
El 6/5/21 a las 13:47, Eugen Block escribió:
> Yes it is possible, but you should validate it with crushtool before
> injecting it to make sure the PGs land where they belong.
>
> crushtool -i crushmap.bin --test --rule 2 --num-rep 7 --show-mappings
> crush
Yes it is possible, but you should validate it with crushtool before
injecting it to make sure the PGs land where they belong.
crushtool -i crushmap.bin --test --rule 2 --num-rep 7 --show-mappings
crushtool -i crushmap.bin --test --rule 2 --num-rep 7 --show-bad-mappings
If you don't get bad ma
Hi, I try to make a new crush rule (Nautilus) in order take the new
correct_failure_domain to hosts:
"rule_id": 2,
"rule_name": "nxtcloudAFhost",
"ruleset": 2,
"type": 3,
"min_size": 3,
"max_size": 7,
"steps": [
{
"op":
Create a new crush rule with the correct failure domain, test it
properly and assign it to the pool(s).
--
Beste Grüße, Joachim Kraftmayer
___
Clyso GmbH
Am 05.05.2021 um 15:11 schrieb Andres Rojas Guerrero:
Nice observation, how can avoid this problem?
El 5
Thanks, I will test it.
El 5/5/21 a las 16:37, Joachim Kraftmayer escribió:
Create a new crush rule with the correct failure domain, test it
properly and assign it to the pool(s).
--
***
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arqu
Hi Andres,
the crush rule with ID 1 distributes your EC chunks over the osds
without considering the ceph host. As Robert already suspected.
Greetings, Joachim
___
Clyso GmbH
Homepage: https://www.clyso.com
Am 05.05.2021 um 13:16 schrieb Andres Rojas Guerrero
Nice observation, how can avoid this problem?
El 5/5/21 a las 14:54, Robert Sander escribió:
Hi,
Am 05.05.21 um 13:39 schrieb Joachim Kraftmayer:
the crush rule with ID 1 distributes your EC chunks over the osds
without considering the ceph host. As Robert already suspected.
Yes, the "nxt
Hi,
Am 05.05.21 um 13:39 schrieb Joachim Kraftmayer:
> the crush rule with ID 1 distributes your EC chunks over the osds
> without considering the ceph host. As Robert already suspected.
Yes, the "nxtcloudAF" rule is not fault tolerant enough. Having the OSD
as failure zone will lead to data los
# ceph osd crush rule dump
[
{
"rule_id": 0,
"rule_name": "replicated_rule",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_n
Am 05.05.21 um 12:34 schrieb Andres Rojas Guerrero:
> Thanks for the answer.
>
>> For the default redundancy rule and pool size 3 you need three separate
>> hosts.
>
> I have 24 separate server nodes with with 32 osd in everyone in total
> 768 osd, my question is why the mds suffer when only 4%
Thanks for the answer.
> For the default redundancy rule and pool size 3 you need three separate
> hosts.
I have 24 separate server nodes with with 32 osd in everyone in total
768 osd, my question is why the mds suffer when only 4% of the osd goes
down (in the same node). I need to modify the cr
Hi,
Am 05.05.21 um 11:44 schrieb Andres Rojas Guerrero:
> I have in the cluster 768 OSD, it is enough that 32 (~ 4%) of them (in
> the same node) fall and the information becomes inaccessible. Is it
> possible to improve this behavior?
You need to spread your failure zone in the crush map. It loo
I have in the cluster 768 OSD, it is enough that 32 (~ 4%) of them (in
the same node) fall and the information becomes inaccessible. Is it
possible to improve this behavior?
# ceph status
cluster:
id: c74da5b8-3d1b-483e-8b3a-739134db6cf8
health: HEALTH_WARN
1 clients fail
They are located on a single node ...
El 5/5/21 a las 11:17, Burkhard Linke escribió:
> Hi,
>
> On 05.05.21 11:07, Andres Rojas Guerrero wrote:
>> Sorry, I have not understood the problem well, the problem I see is that
>> once the OSD fails, the cluster recovers but the MDS remains faulty:
>
>
Hi,
On 05.05.21 11:07, Andres Rojas Guerrero wrote:
Sorry, I have not understood the problem well, the problem I see is that
once the OSD fails, the cluster recovers but the MDS remains faulty:
*snipsnap*
pgs: 1.562% pgs not active
16128 active+clean
238
I see that problem, when the osds fail the mds fail, with errors with
type "slow metadata, slow requests" but do not recover once the cluster
has recovered ... Why?
El 5/5/21 a las 11:07, Andres Rojas Guerrero escribió:
> Sorry, I have not understood the problem well, the problem I see is that
I think that the recovery might be blocked due to all those PGs in inactive
state:
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/administration_guide/monitoring-a-ceph-storage-cluster#identifying-stuck-placement-groups_admin
"""
Inactive: Placement groups cannot proc
Sorry, I have not understood the problem well, the problem I see is that
once the OSD fails, the cluster recovers but the MDS remains faulty:
# ceph status
cluster:
id: c74da5b8-3d1b-483e-8b3a-739134db6cf8
health: HEALTH_WARN
3 clients failing to respond to capability rel
Yes, the principal problem is the MDS start to report slowly and the
information is no longer accessible, and the cluster never recover.
# ceph status
cluster:
id: c74da5b8-3d1b-483e-8b3a-739134db6cf8
health: HEALTH_WARN
2 clients failing to respond to capability release
Can you share more information?
The output of 'ceph status' when the osd is down would help, also 'ceph health
detail' could be useful.
On 05/05 10:48, Andres Rojas Guerrero wrote:
> Hi, I have a Nautilus cluster version 14.2.6 , and I have noted that
> when some OSD go down the cluster doesn't
26 matches
Mail list logo