The classical way to put a limit on recovery is to use the recovery_time_soft
and recovery_time_hard mount options.
See the mount.lustre options:
https://doc.lustre.org/lustre_manual.xhtml#idm139974521647280
recovery_time_soft=timeout
Allows timeout seconds for clients to reconnect for
The maximum amount of time that recovery will run is controlled by "at_max".
The default is 600s (10 mins), but on my 2-client home cluster (with a
relatively light load) the recovery is usually finished in 10s or less.
You can reduce the timeout based on what is your typical time. Note that
Hello all,
I'm wondering if there is any way to tune the maximum amount of time that
lustre will use for a recovery window in the event that imperative recovery
fails due to the failover of an MGS. On MGS failover, we appear to hit a
default timeout of around 6 minutes that seems to be