Gary, Would it be possible to get some additional details on your experience with DRBD? Thank you.
-- Davide Vanzo, PhD Application Developer Adjunct Assistant Professor of Chemical and Biomolecular Engineering Advanced Computing Center for Research and Education (ACCRE) Vanderbilt University - Hill Center 201 (615)-875-9137 www.accre.vanderbilt.edu On 2017-07-25 11:13:30-05:00 Skouson, Gary B wrote: We use an NFS appliance for storing state files. The NFS has been VERY stable. We tried the DRBD shared volume but found that our problems were more likely to be something with the DRBD than with slurmctld. ----- Gary Skouson From: Vanzo, Davide [mailto:davide.va...@vanderbilt.edu] Sent: Tuesday, July 25, 2017 8:39 AM To: slurm-dev <slurm-dev@schedmd.com> Cc: slurm-dev@schedmd.com Subject: [slurm-dev] RE: Slurm with High Availabilty/Automatic failover We are currently experimenting with keepalived+DRBD to have an HA cluster with two nodes where both the controller and the database are hosted on the same node. The reason why we are pursuing this route is because we experienced significant performance and stability issues of having the state files on the cluster parallel filesystem. We are still in the early stages of testing but I will be happy to share our experience if you are interested. -- Davide Vanzo, PhD Application Developer Adjunct Assistant Professor of Chemical and Biomolecular Engineering Advanced Computing Center for Research and Education (ACCRE) Vanderbilt University - Hill Center 201 (615)-875-9137 www.accre.vanderbilt.edu<http://www.accre.vanderbilt.edu> On 2017-07-25 09:20:55-05:00 J. Smith wrote: Does anyone has any suggestions in setting up high availability and automatic failover between two servers that run a Controller daemon, Database daemon and Mysql Database (i.e replication vs galera cluster)? Any input would be appreciated. Thanks!