Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-09-04 Thread John Spray
It's mds_beacon_grace. Set that on the monitor to control the replacement of laggy MDS daemons, and usually also set it to the same value on the MDS daemon as it's used there for the daemon to hold off on certain tasks if it hasn't seen a mon beacon recently. John On Mon, Sep 3, 2018 at 9:26 AM

Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-09-03 Thread William Lawton
Which configuration option determines the MDS timeout period? William Lawton From: Gregory Farnum Sent: Thursday, August 30, 2018 5:46 PM To: William Lawton Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] MDS does not always failover to hot standby on reboot Yes, this is a consequence

Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-09-01 Thread Bryan Henderson
> If the active MDS is connected to a monitor and they fail at the same time, > the monitors can't replace the mds until they've been through their own > election and a full mds timeout window. So how long are we talking? -- Bryan Henderson San Jose, California

Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-08-30 Thread Gregory Farnum
LOn Thu, Aug 30, 2018 at 12:46 PM William Lawton wrote: Oh i see. We’d taken steps to reduce the risk of losing the active mds and > mon leader instances at the same time in the hope that it would prevent > this issue. Do you know if the mds always connects to a specific mon > instance i.e. the

Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-08-30 Thread William Lawton
Oh i see. We’d taken steps to reduce the risk of losing the active mds and mon leader instances at the same time in the hope that it would prevent this issue. Do you know if the mds always connects to a specific mon instance i.e. the mon provider and can it be determined which mon instance that

Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-08-30 Thread Gregory Farnum
Okay, well that will be the same reason then. If the active MDS is connected to a monitor and they fail at the same time, the monitors can’t replace the mds until they’ve been through their own election and a full mds timeout window. On Thu, Aug 30, 2018 at 11:46 AM William Lawton wrote: >

Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-08-30 Thread William Lawton
Thanks for the response Greg. We did originally have co-located mds and mon but realised this wasn't a good idea early on and separated them out onto different hosts. So our mds hosts are on ceph-01 and ceph-02, and our mon hosts are on ceph-03, 04 and 05. Unfortunately we see this issue

Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-08-30 Thread Gregory Farnum
Yes, this is a consequence of co-locating the MDS and monitors — if the MDS reports to its co-located monitor and both fail, the monitor cluster has to go through its own failure detection and then wait for a full MDS timeout period after that before it marks the MDS down. :( We might conceivably

[ceph-users] MDS does not always failover to hot standby on reboot

2018-08-30 Thread William Lawton
Hi. We have a 5 node Ceph cluster (refer to ceph -s output at bottom of email). During resiliency tests we have an occasional problem when we reboot the active MDS instance and a MON instance together i.e. dub-sitv-ceph-02 and dub-sitv-ceph-04. We expect the MDS to failover to the standby