Re: [slurm-users] [EXTERNAL] SlurmDBD losing connection to the backend MariaDB

2022-11-01 Thread Greg Wickham
Hi Richard, While trying to respond I was looking into the manual pages and while it does appear that slurm can support some kind of high availability(*) it doesn’t seem simple. With multiple slurmctld only one can be active at any time as they share state information. It’s not clear how they

Re: [slurm-users] [EXTERNAL] SlurmDBD losing connection to the backend MariaDB

2022-11-01 Thread Richard Chang
Hello Greg, I have a two node set up. node1 is primary slurmctld + backup slurmdbd and node2 is primary slurmdbd + backup slurmctld and mysql database host.  My concern is if node 2 goes down, then the backup slurmdbd will take over, then what will happen ? I have read that slurmctld can

Re: [slurm-users] [EXTERNAL] SlurmDBD losing connection to the backend MariaDB

2022-10-31 Thread Greg Wickham
Hi Richard, Slurmctld caches the updates until slurmdbd comes back online. You can see how many records are pending for the database by using the “sdiag” command and looking for “DBD Agent queue size”. If this number grows significantly it means that slurmdbd isn’t available. -Greg On