[slurm-users] Re: playing with --nodes=

2024-08-29 Thread Matteo Guglielmi via slurm-users
tasks. You see the result in your variables: SLURM_NNODES=3 SLURM_JOB_CPUS_PER_NODE=1(x3) If you only want 2 nodes, make --nodes=2. Brian Andrus On 8/29/24 08:00, Matteo Guglielmi via slurm-users wrote: Hi, On sbatch's manpage there is this example for : --nodes=1,5,9,13 so eit

[slurm-users] Re: playing with --nodes=

2024-08-29 Thread Matteo Guglielmi via slurm-users
Looks like it ignored that and used ntasks with ntasks-per-node as 1, giving you 3 nodes. Check your logs and check your conf see what your defaults are. Brian Andrus On 8/29/2024 5:04 AM, Matteo Guglielmi via slurm-users wrote: Hello, I have a cluster with four Intel nodes (node[01-04], Fe

[slurm-users] playing with --nodes=

2024-08-29 Thread Matteo Guglielmi via slurm-users
Hello, I have a cluster with four Intel nodes (node[01-04], Feature=intel) and four Amd nodes (node[05-08], Feature=amd). # job file #SBATCH --ntasks=3 #SBATCH --nodes=2,4 #SBATCH --constraint="[intel|amd]" env | grep SLURM # slurm.conf PartitionName=DEFAULT MinNodes=1 MaxNodes=UNLIMITED

[slurm-users] Multiple Counts Question

2024-08-29 Thread Matteo Guglielmi via slurm-users
Hello, Does anyone know why this is possible in slurm: --constraint="[rack1*2&rack2*4]" and this is not: --constraint="[rack1*2|rack2*4]" ? Thank you. -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Re: [slurm-users] sharing licences with non slurm workers

2023-03-24 Thread Matteo Guglielmi
Just replying to my own mail here: RTFM So, It was enough to add the following: SchedulerParameters=allow_zero_lic in slurm.conf From: slurm-users on behalf of Matteo Guglielmi Sent: Friday, March 24, 2023 3:03:35 PM To: slurm-us...@schedmd.com

[slurm-users] sharing licences with non slurm workers

2023-03-24 Thread Matteo Guglielmi
Dear all, we have a license server which is allocating licenses to a bunk of workstation not managed with slurm (completely independent boxes) and the nodes of a cluster, all managed with slurm. I wrote a simple script that keeps querying the number of licenses used by the outside "world" a

Re: [slurm-users] All user's jobs killed at the same time on all nodes

2018-07-02 Thread Matteo Guglielmi
at the same time on all nodes A great detective story! > June15 but there is no trace of it anywhere on the disk. Do you have the process ID (pid) of the watchdog.sh You could look in /proc/(pid) /cmdline and see what that shows On 2 July 2018 at 11:37, Matteo Guglielmi mailto:

Re: [slurm-users] All user's jobs killed at the same time on all nodes

2018-07-02 Thread Matteo Guglielmi
reating tasks on the other machines? I would look at the compute nodes while the job is running and do ps -eaf --forest Also using mpirun to run a single core gives me the heebie-jeebies... https://en.wikipedia.org/wiki/Heebie-jeebies_(idiom) On 29 June 2018 at 13:16, Matteo Guglielmi mailto:

Re: [slurm-users] All user's jobs killed at the same time on all nodes

2018-06-29 Thread Matteo Guglielmi
r processes which are started by it on the other compute nodes will be killed. I suspect your user is trying to do womething "smart". You should give that person an example of how to reserve 36 cores and submit a charmm job. On 29 June 2018 at 12:13, Matteo Guglielmi mailto:matteo.g

[slurm-users] All user's jobs killed at the same time on all nodes

2018-06-29 Thread Matteo Guglielmi
Dear comunity, I have a user who usually submits 36 (identical) jobs at a time using a simple for loop, thus jobs are sbatched all the same time. Each job requests a single core and all jobs are independent from one another (read different input files and write to different output files). Jobs