Re: [slurm-users] Database cluster

2024-01-26 Thread Tina Friedrich
We do the same as Josef - we run the database on a VM (single VM, MariaDB) and leave it up to (in our case) VMWare to ensure its availability. Tina On 25/01/2024 11:34, Josef Dvoracek wrote: To protect from HW failure, and to have more free hands when upgrading underlying OS, we use virtualiza

Re: [slurm-users] Database cluster

2024-01-25 Thread Josef Dvoracek
To protect from HW failure, and to have more free hands when upgrading underlying OS, we use virtualization with "live migration"/HA and MariaDB server as a VM. VM is easy to backup, restore as a snapshot, clone for possible tests, etc. In the past, I deployed (customer-requirement) one site u

Re: [slurm-users] Database cluster

2024-01-24 Thread Henkel, Andreas
com>> Sent: 22 January 2024 17:23 To: Slurm User Community List mailto:slurm-users@lists.schedmd.com>> Subject: [slurm-users] Database cluster [You don't often get email from dlhommed...@gmail.com<mailto:dlhommed...@gmail.com>. Learn why this is important at https://aka.ms/

Re: [slurm-users] Database cluster

2024-01-23 Thread Daniel L'Hommedieu
mailto:dlhommed...@gmail.com>> > Sent: 22 January 2024 17:23 > To: Slurm User Community List <mailto:slurm-users@lists.schedmd.com>> > Subject: [slurm-users] Database cluster > > [You don't often get email from dlhommed...@gmail.com > <mailto:dlhomme

Re: [slurm-users] Database cluster

2024-01-23 Thread Xand Meaden
22 January 2024 17:23 To: Slurm User Community List Subject: [slurm-users] Database cluster [You don't often get email from dlhommed...@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] Community: What do you do to ensure database reliabilit

Re: [slurm-users] Database cluster

2024-01-23 Thread Daniel L'Hommedieu
Hi Diego. In our setup, the database is critical. We have some wrapper scripts that consult the database for information, and we also set environment variables on login, based on user/partition associations. If the database is down, none of those things work. I doubt there is appetite in the

Re: [slurm-users] Database cluster

2024-01-23 Thread Diego Zuccato
IIUC the database is not "critical": if it goes down, you lose access to some statistics. But job data gets cached anyway and the db will be updated when it comes back online. Diego Il 22/01/2024 18:23, Daniel L'Hommedieu ha scritto: Community: What do you do to ensure database reliability i

[slurm-users] Database cluster

2024-01-22 Thread Daniel L'Hommedieu
Community: What do you do to ensure database reliability in your SLURM environment? We can have multiple controllers and multiple slurmdbds, but my understanding is that slurmdbd can be configured with a single MySQL server, so what do you do? Do you have that “single MySQL server” be a clust