Does it mean it is best to use a single slurmdbd host in my case?

My primary slurmctld is the backup slurmdbd host, and my worry is if the primary slurmdbd host ( which is also the mariadb server) goes down, will the backup slurmdbd be able to cache data and wait till the mariadb catches up ?

Thanks,

RC

On 11/2/2022 2:00 AM, Brian Andrus wrote:
Ole,

Fair enough, it is actually slurmctld that does the caching. Technical typo on my part there.

Just trying to let the user know, there is a window that they have to ensure no information is lost during a database outage.

Brian Andrus

On 11/1/2022 1:43 AM, Ole Holm Nielsen wrote:
Hi Brian,

On 11/1/22 05:28, Brian Andrus wrote:
It caches up to a point. As I understand it, that is about an hour (depending on size and how busy the cluster is, as well as available memory, etc).

Have you found any documentation of slurmdbd caching?  It's well-known that slurmctld caches information while slurmdbd is down, see for example page 30 in the talk "Field Notes Mark 2: Random Musings From Under A New Hat"[1] by Tim Wickberg, SchedMD:

For slurmdbd, the critical element in the failure domain is
MySQL, not slurmdbd. slurmdbd itself is stateless.
● slurmctld will cache accounting records (up to a limit) if
slurmdbd is unavailable. This can be hours+ to days+
depending on your system without data loss.

The statelessness of slurmdbd makes me think that it can't cache any data.

Thanks,
Ole

[1] https://slurm.schedmd.com/publications.html

On 10/31/2022 9:20 PM, Richard Chang wrote:
Hi,

Just for my info, I would like to know what happens when SlurmDBD loses connection to the backend Database, for ex, MariaDB.

Does it cache the accounting info and keep them till the DB comes back up ?, or does it panic and shut down ?



Reply via email to