>In the monitor log you sent along, the monitor was crashing on a
setcrushmap command. Where in this sequence of events did that happen?
It's happened after I try to upload different crushmap, much later step 13.
>Where are you getting these numbers 82-84 and 92-94 from? They don't
appear in any a
In the monitor log you sent along, the monitor was crashing on a
setcrushmap command. Where in this sequence of events did that happen?
On Wed, Jul 17, 2013 at 5:07 PM, Vladislav Gorbunov wrote:
> That's what I did:
>
> cluster state HEALTH_OK
>
> 1. load crush map from cluster:
> https://dl.drop
That's what I did:
cluster state HEALTH_OK
1. load crush map from cluster:
https://dl.dropboxusercontent.com/u/2296931/ceph/crushmap1.txt
2. modify crush map for adding pool and ruleset iscsi with 2
datacenters, upload crush map to cluster:
https://dl.dropboxusercontent.com/u/2296931/ceph/crushma
On Wed, Jul 17, 2013 at 4:40 AM, Vladislav Gorbunov wrote:
> Sorry, not send to ceph-users later.
>
> I check mon.1 log and found that cluster was not in HEALTH_OK when set
> ruleset to iscsi:
> 2013-07-14 15:52:15.715871 7fe8a852a700 0 log [INF] : pgmap
> v16861121: 19296 pgs: 19052 active+clean
Sorry, not send to ceph-users later.
I check mon.1 log and found that cluster was not in HEALTH_OK when set
ruleset to iscsi:
2013-07-14 15:52:15.715871 7fe8a852a700 0 log [INF] : pgmap
v16861121: 19296 pgs: 19052 active+clean, 73
active+remapped+wait_backfill, 171 active+remapped+b
ackfilling; 9
Yes, I changed original crushmap, need to take nodes gstore1, gstore2,
cstore5 for new cluster. I have only crushmap from the failed cluster
downloaded immediately after cluster was crushed. It on attachment.
2013/7/17 Gregory Farnum :
> Have you changed either of these maps since you originally
output is in the attached files
2013/7/17 Gregory Farnum :
> The maps in the OSDs only would have gotten there from the monitors.
> If a bad map somehow got distributed to the OSDs then cleaning it up
> is unfortunately going to take a lot of work without any well-defined
> processes.
> So if you
Have you changed either of these maps since you originally switched to
use rule 3?
Can you compare them to what you have on your test cluster? In
particular I see that you have 0 weight for all the buckets in the
crush pool, which I expect to misbehave but not to cause the OSD to
crash everywhere.
The maps in the OSDs only would have gotten there from the monitors.
If a bad map somehow got distributed to the OSDs then cleaning it up
is unfortunately going to take a lot of work without any well-defined
processes.
So if you could just do "ceph osd crush dump" and "ceph osd dump" and
provide th
Gregory, thank for you help!
After all osd servers downed, i'am back rule set for the iscsi pool
back to default rule 0:
ceph osd pool set iscsi crush_ruleset 0
it does not help, all osd not started, except without data, with weight 0.
next i remove ruleset iscsi from crush map. It does not help to
I notice that your first dump of the crush map didn't include rule #3.
Are you sure you've injected it into the cluster? Try extracting it
from the monitors and looking at that map directly, instead of a
locally cached version.
You mentioned some problem with OSDs being positioned wrong too, so
you
ruleset 3 is:
rule iscsi {
ruleset 3
type replicated
min_size 1
max_size 10
step take iscsi
step chooseleaf firstn 0 type datacenter
step chooseleaf firstn 0 type host
step emit
}
2013/7/16 Vladislav Gorbunov :
> sorry, after i'm try
sorry, after i'm try to apply crush ruleset 3 (iscsi) to
pool iscsi:
ceph osd pool set iscsi crush_ruleset 3
2013/7/16 Vladislav Gorbunov :
>>Have you run this crush map through any test mappings yet?
> Yes, it worked on test cluster, and after apply map to main cluster.
> OSD servers downed after
>Have you run this crush map through any test mappings yet?
Yes, it worked on test cluster, and after apply map to main cluster.
OSD servers downed after i'm try to apply crush ruleset 3 (iscsi) to
pool iscsi:
ceph osd pool set data crush_ruleset 3
2013/7/16 Gregory Farnum :
> It's probably not th
It's probably not the same issue as that ticket, which was about the
OSD handling a lack of output incorrectly. (It might be handling the
output incorrectly in some other way, but hopefully not...)
Have you run this crush map through any test mappings yet?
-Greg
Software Engineer #42 @ http://inkt
Sympthoms like on http://tracker.ceph.com/issues/4699
all OSDs the process ceph-osd crash with segfault
If i stop MONs daemons then i can start OSDs but if i start MONs back
then die all OSDs again.
more detailed log:
0> 2013-07-15 16:42:05.001242 7ffe5a6fc700 -1 *** Caught signal
(Segmenta
Hello!
After change the crush map all osd (ceph version 0.61.4
(1669132fcfc27d0c0b5e5bb93ade59d147e23404)) on pool default is crushed
with the error:
2013-07-14 17:26:23.755432 7f0c963ad700 -1 *** Caught signal
(Segmentation fault) **
in thread 7f0c963ad700
...skipping...
10: (OSD::PeeringWQ::_p
17 matches
Mail list logo