It's mds_beacon_grace. Set that on the monitor to control the
replacement of laggy MDS daemons, and usually also set it to the same
value on the MDS daemon as it's used there for the daemon to hold off
on certain tasks if it hasn't seen a mon beacon recently.
John
On Mon, Sep 3, 2018 at 9:26 AM
Which configuration option determines the MDS timeout period?
William Lawton
From: Gregory Farnum
Sent: Thursday, August 30, 2018 5:46 PM
To: William Lawton
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] MDS does not always failover to hot standby on reboot
Yes, this is a consequence
> If the active MDS is connected to a monitor and they fail at the same time,
> the monitors can't replace the mds until they've been through their own
> election and a full mds timeout window.
So how long are we talking?
--
Bryan Henderson San Jose, California
LOn Thu, Aug 30, 2018 at 12:46 PM William Lawton
wrote:
Oh i see. We’d taken steps to reduce the risk of losing the active mds and
> mon leader instances at the same time in the hope that it would prevent
> this issue. Do you know if the mds always connects to a specific mon
> instance i.e. the
Oh i see. We’d taken steps to reduce the risk of losing the active mds and mon
leader instances at the same time in the hope that it would prevent this issue.
Do you know if the mds always connects to a specific mon instance i.e. the mon
provider and can it be determined which mon instance that
Okay, well that will be the same reason then. If the active MDS is
connected to a monitor and they fail at the same time, the monitors can’t
replace the mds until they’ve been through their own election and a full
mds timeout window.
On Thu, Aug 30, 2018 at 11:46 AM William Lawton
wrote:
>
Thanks for the response Greg. We did originally have co-located mds and mon but
realised this wasn't a good idea early on and separated them out onto different
hosts. So our mds hosts are on ceph-01 and ceph-02, and our mon hosts are on
ceph-03, 04 and 05. Unfortunately we see this issue
Yes, this is a consequence of co-locating the MDS and monitors — if the MDS
reports to its co-located monitor and both fail, the monitor cluster has to
go through its own failure detection and then wait for a full MDS timeout
period after that before it marks the MDS down. :(
We might conceivably
Hi.
We have a 5 node Ceph cluster (refer to ceph -s output at bottom of email).
During resiliency tests we have an occasional problem when we reboot the active
MDS instance and a MON instance together i.e. dub-sitv-ceph-02 and
dub-sitv-ceph-04. We expect the MDS to failover to the standby