Re: [slurm-users] How to limit # of execution slots for a given node

2022-01-06 Thread Rémi Palancher
n control over the exact list of reserved CPUs regarding NUMA topology or whatever. -- Rémi Palancher Rackslab: Open Source Solutions for HPC Operations https://rackslab.io

Re: [slurm-users] Scheduler does not reserve resources

2022-01-17 Thread Rémi Palancher
l reason is the absence of timelimit on the running jobs. In t his case Slurm is unable to define when the running jobs are over, when the next highest priority job can start and eventually unable to define if lower priority jobs actually delay higher priority jobs. -- Rémi Palancher Rackslab: Open Source Solutions for HPC Operations https://rackslab.io

Re: [slurm-users] big increase of MaxStepCount?

2022-01-19 Thread Rémi Palancher
to handle it gracefully [1]. [1] https://slurm.schedmd.com/high_throughput.html -- Rémi Palancher Rackslab: Open Source Solutions for HPC Operations https://rackslab.io

Re: [slurm-users] Secondary Unix group id of users not being issued in interactive srun command

2022-01-28 Thread Rémi Palancher
e UID of the shell. The second command resolves johndoe UID through nsswitch stack then looks after the groups of this UID. Do you have johndoe declared in both local /etc/passwd and LDAP directory with different UID? Do `id` and `id johndoe` return the same UID? -- Rémi Palancher Rackslab

Re: [slurm-users] how to allocate high priority to low cpu and memory jobs

2022-01-28 Thread Rémi Palancher
Additionnaly to Michael proposal with the partitions, you could also set up a QOS for low memory jobs, with a high priority and MaxTRESPerJob. -- Rémi Palancher Rackslab: Open Source Solutions for HPC Operations https://rackslab.io

Re: [slurm-users] Limiting srun to a specific partition

2022-02-14 Thread Rémi Palancher
ere: https://bugs.schedmd.com/show_bug.cgi?id=3094 Best, -- Rémi Palancher Rackslab: Open Source Solutions for HPC Operations https://rackslab.io

Re: [slurm-users] Slurm database field for SystemCPU, UserCPU, TotalCPU

2022-03-18 Thread Rémi Palancher
ec fields from the cluster step_table. The total is computed, it is the sum of these fields, as you can see here: https://github.com/SchedMD/slurm/blob/fd6fef3e14a0c6d1484230744289749c0e4b19d0/src/plugins/accounting_storage/mysql/as_mysql_jobacct_process.c#L1063 Best, -- Rémi Palancher Rackslab: Ope

Re: [slurm-users] why sacct display wrong username while the UID is right?

2022-03-18 Thread Rémi Palancher
oc field. There might not be NSS resolution in the output. Did the UID of phywht change over time? That would explain why the jobs are associated to this user in the SlurmDBD database. -- Rémi Palancher Rackslab: Open Source Solutions for HPC Operations https://rackslab.io

[slurm-users] New future and roadmap for Slurm-web

2023-05-08 Thread Rémi Palancher
announcement can still be found in the archives of this mailing-list! [1] [1] https://groups.google.com/g/slurm-users/c/LiD2Pa8r22A/m/fDHWm5GomJsJ [2] https://www.edf.fr/en [3] https://rackslab.io -- Rémi Palancher Rackslab: Open Source Solutions for HPC Operations https://rackslab.io

Re: [slurm-users] Configuring slurm.conf and using subpartitions

2023-10-04 Thread Rémi Palancher
.com/slurm.conf.html#SECTION_NODE-CONFIGURATION -- Rémi Palancher Rackslab: Open Source Solutions for HPC Operations https://rackslab.io

Re: [slurm-users] Response to Rémi Palancher about Configuring slurm.conf and using subpartitions

2023-10-05 Thread Rémi Palancher
Weight value and they will be added to the pool of nodes being > considered for scheduling individually. [1] https://github.com/SchedMD/slurm/blob/10b6d5122b77eae417546d5263757d0ed1b2fd31/src/common/read_config.c#L1667 [2] https://slurm.schedmd.com/slurm.conf.html#OPT_Weight -- Rémi Palancher Rackslab: Open Source Solutions for HPC Operations https://rackslab.io

Re: [slurm-users] auth_munge.so: Incompatible Slurm plugin version (21.08.8)

2023-10-05 Thread Rémi Palancher
r DRMMA layer against Slurm 21.08.8 headers and library? -- Rémi Palancher Rackslab: Open Source Solutions for HPC Operations https://rackslab.io

Re: [slurm-users] Slurm account coordinator

2023-10-12 Thread Rémi Palancher
ize the list of accounts users are coordinating with: $ sacctmgr show users WithCoord -- Rémi Palancher Rackslab: Open Source Solutions for HPC Operations https://rackslab.io

Re: [slurm-users] Configure a user as "admin" only in his/her account

2023-10-18 Thread Rémi Palancher
tudents names=teacher Then teacher will have the ability to cancel students' jobs among other things (eg. set limits on students associations, etc). It won't have any special privilege on other accounts. -- Rémi Palancher Rackslab: Open Source Solutions for HPC Operations https://rackslab.io

Re: [slurm-users] How to delay the start of slurmd until Infiniband/OPA network is fully up?

2023-11-01 Thread Rémi Palancher
] https://github.com/SchedMD/slurm/commit/b31fa177c1ca26dcd2d5cd952e692ef87d95b528 -- Rémi Palancher Rackslab: Open Source Solutions for HPC Operations https://rackslab.io/

Re: [slurm-users] how to configure correctly node and memory when a script fails with out of memory

2023-11-01 Thread Rémi Palancher
sbatch: error: Batch job submission failed: Requested node configuration > is not available Do you have a MaxMemPerCPU on the cluster or on the partition? If this value is too low, this could make the job fail due to CPU count limit. -- Rémi Palancher Rackslab: Open Source Solutions for HPC Operations https://rackslab.io/

Re: [slurm-users] GraceTime is not working, But there is log.

2023-11-08 Thread Rémi Palancher
ram must either trap SIGTERM with a signal handler or you must enable send_user_signal PreemptParameters flag and submit your job with --signal and another signal. -- Rémi Palancher Rackslab: Open Source Solutions for HPC Operations https://rackslab.io/

Re: [slurm-users] Graphing job metrics

2017-11-14 Thread Rémi Palancher
Hi there, Le 13/11/2017 à 18:18, Nicholas McCollum a écrit : Now that there is a slurm-users mailing list, I thought I would share something with the community that I have been working on to see if anyone else is interested in it. I have a lot of students on my cluster and I really wanted a way

[slurm-users] Announcing Slurm-web v3.0.0, open source web dashboard for Slurm

2024-05-13 Thread Rémi Palancher via slurm-users
dmap/ -- Rémi Palancher Rackslab: Open Source Solutions for HPC Operations https://rackslab.io -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com