Dear All,

I can contribute a few simple scripts to coordinate the start / stop of the
whole Lustre file system. Everyone is welcome to use it or modify it to fit
the usage of your system. Sorry that I did not prepare a completed document
for these scripts. Here I only mention some relevant usages briefly. If you
are interested in more details, I will be happy to answer here.

- server:/opt/lustre/etc/cfs-chome:
   The configuration file, where the Lustre file system is named "chome".
The head node is named "server", which is also one of the Lustre clients.
This file lists all the MGS, MDS, OSS, and lustre clients. If MGS and MDS
have both ethernet and infiniband networks, you can specify their IP
explicitly. If MDT or OST were formatted by ZFS, you can list them as well.

- server:/opt/lustre/etc/cfsd:
   The main script to coordinate the start / stop / shutdown (emergent
shutdown) of the Lustre system, running in the head node. The usage is:
   # cd /opt/lustre/etc/
   # ./cfsd start chrome
   # ./cfsd stop chrome
   # ./cfsd shutdown

   When doing "start", it will do the following procedures (the script will
ssh into each file servers and clients to do the mount):
   1. If some of the MDT/OST were based on ZFS, it starts ZFS of these
MDT/OST first.
   2. Mount MGT, MDT, and OST in order.
   3. Mount all the clients.

   When doing "stop", it will reverse the above procedures to do unmount.

   When doing "shutdown", usually used when the air-conditioner of the
computing room is broken, and the whole room is in a emergent state that we
need to shutdown the whole system as fast as possible:
   1. Shutdown all the clients (for the head node, only unmount Lustre
without shutdown) right now.
   2. Unmount all the OST, MDT, MGT, and then shutdown these servers.
   3. Shutdown the head node.

- client:/etc/init.d/lustre_mnt:
   Sometimes the clients have to be rebooted, and we want it to mount
Lustre automatically, or unmount Lustre correctly during shutdown. This
script do this work. It reads /opt/lustre/etc/cfs-chome to check whether
all the file servers are alive, determine whether it should mount Lustre
through ethernet or infiniband, and do the mount. When doing unmount, after
unmount it also unload all the Lustre kernel modules. The usage is:
   # /etc/init.d/lustre_mnt start
   # /etc/init.d/lustre_mnt stop

- client:/etc/systemd/system/sysinit.target.wants/lustre_mnt.service:
   If the client has infiniband network, it is very annoying that it will
stop OpenIB quite quickly before shutdown the Lustre mounts, and then hang
the system without power-off. Hence, this file is to tell systemd to wait
for /etc/init.d/lustre_mnt stop and then proceed the shutdown of OpenIB.

Please note that these scripts may have bugs when used in variety
environments. And also note that these scripts does not implement the case
of Lustre HA (because we don't have). If you have any suggestions, I will
be very appreciated. I am also very happy if you could find them useful.

Cheers,

T.H.Hsieh

Bertschinger, Thomas Andrew Hjorth via lustre-discuss <
lustre-discuss@lists.lustre.org> 於 2023年12月7日 週四 上午12:01寫道:

> Hello Jan,
>
> You can use the Pacemaker / Corosync high-availability software stack for
> this: specifically, ordering constraints [1] can be used.
>
> Unfortunately, Pacemaker is probably over-the-top if you don't need HA --
> its configuration is complex and difficult to get right, and it
> significantly complicates system administration. One downside of Pacemaker
> is that it is not easy to decouple the Pacemaker service from the Lustre
> services, meaning if you stop the Pacemaker service, it will try to stop
> all of the Lustre services. This might make it inappropriate for use cases
> that don't involve HA.
>
> Given those downsides, if others in the community have suggestions on
> simpler means to accomplish this, I'd love to see other tools that can be
> used here (especially officially supported ones, if they exist).
>
> [1]
> https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/html/constraints.html#specifying-the-order-in-which-resources-should-start-stop
>
> - Thomas Bertschinger
>
> ________________________________________
> From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf
> of Jan Andersen <j...@comind.io>
> Sent: Wednesday, December 6, 2023 3:27 AM
> To: lustre
> Subject: [EXTERNAL] [lustre-discuss] Coordinating cluster start and
> shutdown?
>
> Are there any tools for coordinating the start and shutdown of lustre
> filesystem, so that the OSS systems don't attempt to mount disks before the
> MGT and MDT are online?
> _______________________________________________
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>

Attachment: cfs-chome
Description: Binary data

Attachment: cfsd
Description: Binary data

Attachment: lustre_mnt
Description: Binary data

Attachment: lustre_mnt.service
Description: Binary data

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to