[slurm-users] Question about CPUs and cores

2024-01-25 Thread GestiĆ³ Servidors
Hi, I want to run a simple test that uses one node and four cores. Also, in my script, I execute a binary that reports me in what core is running one of the four tasks. These are my files: * submit script: #!/bin/bash #SBATCH --job-name=test_jobs # Job name #SBATCH --output=test_jo

[slurm-users] slurmctld: slurm_bufs_sendto(msg_type=SRUN_STEP_SIGNAL) failed: Connection reset by peer

2024-01-25 Thread Rike-Benjamin Schuppner
Hi, I am getting the following error in the logs whenever I run a few srun jobs in a batch. Jan 25 11:24:03 slurmctl.XYZ slurmctld[272961]: slurmctld: debug: _send_timeout: Socket POLLERR: Connection reset by peer Jan 25 11:24:03 slurmctl.XYZ slurmctld[272961]: slurmctld: error: slurm_s

Re: [slurm-users] Database cluster

2024-01-25 Thread Josef Dvoracek
To protect from HW failure, and to have more free hands when upgrading underlying OS, we use virtualization with "live migration"/HA and MariaDB server as a VM. VM is easy to backup, restore as a snapshot, clone for possible tests, etc. In the past, I deployed (customer-requirement) one site u

[slurm-users] Problem using Podman with scrun on SLURM 23.11.3

2024-01-25 Thread Marcus Lauer
I am getting an unusual error when trying to run Podman containers using scrun on SLURM 23.11.3 (and 23.11.1 previously). In short, Podman works when not configured to use scrun, but when configured to use scrun it fails. Podman gives this error: scrun: fatal: Unable to request job alloca