Hi all!

I would like to start a discussion with developers and especially with
slurm users/admins about slurm high availability.

First, I would like to ask you to share with us your HA solutions for your
clusters, and second, I would like to ask for your advises and suggestions
about a specific setup and what would be the best HA approach.

Let's say that we have two management nodes admin1 and admin2, and we have
both local and shared filesystems available for these nodes. We can provide
NFS exports, DRBD and whatever other services we need for HA. admin1 will
be the primary controller and admin2 will be the backup controller. And we
want to provide high availability for slurmctld, slurmdbd and mysqld
daemons. Also the database files also need a HA approach, most probably in
a shared filesystem.

So, I would like to ask you which would be the best approach to provide HA
for Slurm? We want a good solution so even the accounting will be also HA
always.

I am waiting for your interesting answers and thanks in advance!!!

Best Regards,
Chrysovalantis Paschoulas
Juelich Supercomputing Centre
Forschungszentrum Juelich

Reply via email to