Mike / Martin,
The OSD down behavior Mike is seeing is different. You should be seeing
messages like this in your leader's monitor log:
can_mark_down current up_ratio 0.17 < min 0.3, will not mark osd.2 down
To dampen certain kinds of cascading failures, we are deliberately restricting
a
The behavior you are seeing is exactly what would be expected if OSDs are not
being marked out. The testing of my fix showed that if a portion of a rack's
OSDs go down they will be marked out after the configured amount of time (5 min
by default). Once down OSDs are out the remaining OSDs tak
Sage,
I confirm this issue. The requested info is listed below.
*Note that due to the pre-Cuttlefish monitor sync issues, this
deployment has been running three monitors (mon.b and mon.c working
properly in quorum. mon.a stuck forever synchronizing).
For the past two hours, no OSD processes
David / Martin,
I can confirm this issue. At present I am running monitors only with
100% of my OSD processes shutdown down. For the past couple hours, Ceph
has reported:
osdmap e1323: 66 osds: 19 up, 66 in
I can mark them down manually using
ceph osd down 0
as expected, but they never get
Hi David,
did you test it with more than one rack as well? In my first problem I
used two racks, with a custom crushmap, so that the replicas are in the
two racks (replicationlevel = 2). Than I took one osd down, and expected
that the remaining osds in this rack would get the now missing replicas
I filed tracker bug 4822 and have wip-4822 with a fix. My manual testing shows
that it works. I'm building a teuthology test.
Given your osd tree has a single rack it should always mark OSDs down after 5
minutes by default.
David Zafman
Senior Developer
http://www.inktank.com
On Apr 25,
Hi Sage,
On 25.04.2013 18:17, Sage Weil wrote:
> What is the output from 'ceph osd tree' and the contents of your
> [mon*] sections of ceph.conf?
>
> Thanks!
> sage
root@store1:~# ceph osd tree
# idweight type name up/down reweight
-1 24 root default
-3 24
On Thu, 25 Apr 2013, Martin Mailand wrote:
> Hi,
>
> if I shutdown an OSD, the OSD gets marked down after 20 seconds, after
> 300 seconds the osd should get marked out, an the cluster should resync.
> But that doesn't happened, the OSD stays in the status down/in forever,
> therefore the cluster s
Hi Wido,
I did not set the noosdout flag.
-martin
On 25.04.2013 14:56, Wido den Hollander wrote:
> Could you dump your osdmap? The first 10 lines would be interesting.
> There is a flag where you say "noosdout", could it be that the flag is set?
>
> Wido
epoch 206
fsid 13538f8a-a9b5-4f57-ad72
On 04/25/2013 02:07 PM, Martin Mailand wrote:
Hi,
if I shutdown an OSD, the OSD gets marked down after 20 seconds, after
300 seconds the osd should get marked out, an the cluster should resync.
But that doesn't happened, the OSD stays in the status down/in forever,
therefore the cluster stays fo
Hi,
if I shutdown an OSD, the OSD gets marked down after 20 seconds, after
300 seconds the osd should get marked out, an the cluster should resync.
But that doesn't happened, the OSD stays in the status down/in forever,
therefore the cluster stays forever degraded.
I can reproduce it with a new in
11 matches
Mail list logo