from:"Loris Bennett via slurm\-users"

[slurm-users] Re: seff for GPU

2025-09-02 Thread Loris Bennett via slurm-users

Hi, Josu Lazkano Lete via slurm-users writes: > Hello, > > We are looking to optimize the GPU jobs of our HPC users, is it possible to > add GPU info in the seff? > > It will be great to know how much GPU resources the users request and compare > with how much GPU resources they use. Various

[slurm-users] Re: Implementing a "soft" wall clock limit

2025-06-16 Thread Loris Bennett via slurm-users

Hi Prentice, Prentice Bisbal via slurm-users writes: > I think the idea of having a generous default timelimit is the wrong way to > go. In fact, I think any defaults for jobs are a bad way to go. The majority > of your > users will just use that default time limit, and backfill scheduling w

[slurm-users] How are the results produced by 'seff'?

2025-06-12 Thread Loris Bennett via slurm-users

Hi, With Slurm 24.11.5 for some jobs I am seeing differences between the memory usage reported by 'seff' and that shown by Prometheus as 'cgroup_memory_rss_bytes' (and ultimately reported by 'jobstats' [1]). Certainly at the University of Delft they seem to feel that the memory usage reported by '

[slurm-users] Re: Implementing a "soft" wall clock limit

2025-06-11 Thread Loris Bennett via slurm-users

for individual jobs, when requested. We also don't pre-empt any jobs. Apart from that, I imaging implementing your 'soft' limits robustly might be quite challenging and/or time-consuming, as I am not aware that Slurm has anything like that built in. Cheers, Loris > On Wed,

[slurm-users] Re: Implementing a "soft" wall clock limit

2025-06-11 Thread Loris Bennett via slurm-users

Hi Davide, Davide DelVento via slurm-users writes: > In the institution where I work, so far we have managed to live > without mandatory wallclock limits (a policy decided well before I > joined the organization), and that has been possible because the > cluster was not very much utilized. > > N

[slurm-users] Re: Large memory jobs stuck Pending. Should use --time parameter?

2025-05-07 Thread Loris Bennett via slurm-users

Mike via slurm-users writes: > Greetings, > > We are new to Slurm and we are trying to better understand why we’re seeing > high-mem jobs stuck in Pending state indefinitely. Smaller (mem) jobs in the > queue will continue to pass by the high mem jobs even when we bump priority > on a pending hi

[slurm-users] Re: How can we put limits on interactive jobs?

2025-04-25 Thread Loris Bennett via slurm-users

Hi Ole, Ole Holm Nielsen via slurm-users writes: > We would like to put limits on interactive jobs (started by salloc) so > that users don't leave unused interactive jobs behind on the cluster > by mistake. > > I can't offhand find any configurations that limit interactive jobs, > such as enforci

[slurm-users] Re: Minimum cpu cores per node partition level configuration

2025-04-03 Thread Loris Bennett via slurm-users

Hi Tim, "Cutts, Tim via slurm-users" writes: > You can set a partition QoS which specifies a minimum. We have such a qos on > our large-gpu partition; we don’t want people scheduling small stuff to it, > so we > have this qos: How does this affect total throughput? Presumably, 'small' GPU

[slurm-users] Unable to receive password reminder

2025-01-14 Thread Loris Bennett via slurm-users

Hi, Over a week ago I sent the message below to the address I found for the list owner, but have not received a response. Does anyone know how to proceed in this case? Cheers, Loris Start of forwarded message From: Loris Bennett To: Subject: Unable t

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

2024-09-06 Thread Loris Bennett via slurm-users

e data points. Cheers, Loris > -Paul Edmon- > > On 9/5/24 10:22 AM, Loris Bennett via slurm-users wrote: >> Jason Simms via slurm-users writes: >> >>> Ours works fine, however, without the InteractiveStepOptions parameter. >> My assumption is also that default v

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

2024-09-05 Thread Loris Bennett via slurm-users

02068@lt10000 ~]$ > > Best Regards, > Carsten > > Am 05.09.24 um 14:17 schrieb Loris Bennett via slurm-users: > > Hi, > > > > With > > > >$ salloc --version > >slurm 23.11.10 > > > > and > > > >

[slurm-users] salloc not starting shell despite LaunchParameters=use_interactive_step

2024-09-05 Thread Loris Bennett via slurm-users

Hi, With $ salloc --version slurm 23.11.10 and $ grep LaunchParameters /etc/slurm/slurm.conf LaunchParameters=use_interactive_step the following $ salloc --partition=interactive --ntasks=1 --time=00:03:00 --mem=1000 --qos=standard salloc: Granted job allocation 18928869 sal

[slurm-users] Re: Unable to run sequential jobs simultaneously on the same node

2024-08-19 Thread Loris Bennett via slurm-users

Dear Arko, Arko Roy writes: > Thanks Loris and Gareth. here is the job submission script. if you find any > errors please let me know. > since i am not the admin but just an user, i think i dont have access to the > prolog and epilogue files. > > If the jobs are independent, why do you want to

[slurm-users] Re: Unable to run sequential jobs simultaneously on the same node

2024-08-18 Thread Loris Bennett via slurm-users

Dear Arko, Arko Roy via slurm-users writes: > I want to run 50 sequential jobs (essentially 50 copies of the same code with > different input parameters) on a particular node. However, as soon as one of > the > jobs gets executed, the other 49 jobs get killed immediately with exit code > 9.

[slurm-users] Re: How to exclude master from computing? Set to DRAINED?

2024-06-24 Thread Loris Bennett via slurm-users

Hi Xaver, Xaver Stiensmeier via slurm-users writes: > Dear Slurm users, > > in our project we exclude the master from computing before starting > Slurmctld. We used to exclude the master from computing by simply not > mentioning it in > the configuration i.e. just not having: > > Partition

[slurm-users] Re: sbatch: Node count specification invalid - when only specifying --ntasks

2024-06-11 Thread Loris Bennett via slurm-users

Hi George, George Leaver via slurm-users writes: > Hi Loris, > >> Doesn't splitting up your jobs over two partitions mean that either >> one of the two partitions could be full, while the other has idle >> nodes? > > Yes, potentially, and we may move away from our current config at some > point

[slurm-users] Re: sbatch: Node count specification invalid - when only specifying --ntasks

2024-06-10 Thread Loris Bennett via slurm-users

Hi George, George Leaver via slurm-users writes: > Hello, > > Previously we were running 22.05.10 and could submit a "multinode" job > using only the total number of cores to run, not the number of nodes. > For example, in a cluster containing only 40-core nodes (no > hyperthreading), Slurm woul

[slurm-users] Re: diagnosing why interactive/non-interactive job waits are so long with State=MIXED

2024-06-05 Thread Loris Bennett via slurm-users

Ryan Novosielski via slurm-users writes: > We do have bf_continue set. And also bf_max_job_user=50, because we > discovered that one user can submit so many jobs that it will hit the limit > of the number > it’s going to consider and not run some jobs that it could otherwise run. > > On Jun 4

[slurm-users] Re: GPU GRES verification and some really broad questions.

2024-05-10 Thread Loris Bennett via slurm-users

Hi, Shooktija S N via slurm-users writes: > Hi, > > I am a complete slurm-admin and sys-admin noob trying to set up a 3 node > Slurm cluster. I have managed to get a minimum working example running, in > which I am able to use a GPU (NVIDIA GeForce RTX 4070 ti) as a GRES. > > This is slurm.co

[slurm-users] Re: [EXTERN] Re: scheduling according time requirements

2024-04-30 Thread Loris Bennett via slurm-users

Hi Dietmar, Dietmar Rieder via slurm-users writes: > Hi Loris, > > On 4/30/24 3:43 PM, Loris Bennett via slurm-users wrote: >> Hi Dietmar, >> Dietmar Rieder via slurm-users >> writes: >> >>> Hi Loris, >>> >>> On 4/30/24 2

[slurm-users] Re: [EXTERN] Re: scheduling according time requirements

2024-04-30 Thread Loris Bennett via slurm-users

Hi Dietmar, Dietmar Rieder via slurm-users writes: > Hi Loris, > > On 4/30/24 2:53 PM, Loris Bennett via slurm-users wrote: >> Hi Dietmar, >> Dietmar Rieder via slurm-users >> writes: >> >>> Hi, >>> >>> is it possible to have slur

[slurm-users] Re: scheduling according time requirements

2024-04-30 Thread Loris Bennett via slurm-users

Hi Dietmar, Dietmar Rieder via slurm-users writes: > Hi, > > is it possible to have slurm scheduling jobs automatical according to > the "-t" time requirements to a fitting partition? > > e.g. 3 partitions > > PartitionName=standard Nodes=c-[01-10] Default=YES MaxTime=04:00:00 > DefaultTime=00:1

[slurm-users] Re: Avoiding fragmentation

2024-04-08 Thread Loris Bennett via slurm-users

Hi Gerhard, Gerhard Strangar via slurm-users writes: > Hi, > > I'm trying to figure out how to deal with a mix of few- and many-cpu > jobs. By that I mean most jobs use 128 cpus, but sometimes there are > jobs with only 16. As soon as that job with only 16 is running, the > scheduler splits the

[slurm-users] Re: Suggestions for Partition/QoS configuration

2024-04-04 Thread Loris Bennett via slurm-users

Hi Thomas, "thomas.hartmann--- via slurm-users" writes: > Hi, > we're testing possible slurm configurations on a test system right now. > Eventually, it is going to serve ~1000 users. > > We're going to have some users who are going to run lots of short jobs > (a couple of minutes to ~4h) and s

[slurm-users] job_submit.lua - uid in Docker cluster

2024-02-14 Thread Loris Bennett via slurm-users

Hi, Having used https://github.com/giovtorres/slurm-docker-cluster successfully a couple of years ago to develop a job_submit.lua plugin, I am trying to do this again. However, the plugin which works on our current cluster (CentOS 7.9, Slurm 23.02.7) fails in the Docker cluster (Rocky 8.9, S

[slurm-users] Re: Starting a job after a file is created in previous job (dependency looking for soluton)

2024-02-06 Thread Loris Bennett via slurm-users

Hi Ajad, Amjad Syed via slurm-users writes: > Hello > > I have the following scenario: > I need to submit a sequence of up to 400 jobs where the even jobs depend on > the preceeding odd job to finish and every odd job depends on the presence of > a > file generated by the preceding even job (a

[slurm-users] Re: SLURM configuration for LDAP users

2024-02-05 Thread Loris Bennett via slurm-users

Hi Richard, Richard Chang via slurm-users writes: > Job submission works for local users. I was not aware we need to manually > add the LDAP users to the SlurmDB. Does it mean we need to add each and every > user in LDAP to the Slurm database ? We add users to the Slurm DB automatically with

[slurm-users] Re: seff for GPU

[slurm-users] Re: Implementing a "soft" wall clock limit

[slurm-users] How are the results produced by 'seff'?

[slurm-users] Re: Implementing a "soft" wall clock limit

[slurm-users] Re: Implementing a "soft" wall clock limit

[slurm-users] Re: Large memory jobs stuck Pending. Should use --time parameter?

[slurm-users] Re: How can we put limits on interactive jobs?

[slurm-users] Re: Minimum cpu cores per node partition level configuration

[slurm-users] Unable to receive password reminder

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

[slurm-users] salloc not starting shell despite LaunchParameters=use_interactive_step

[slurm-users] Re: Unable to run sequential jobs simultaneously on the same node

[slurm-users] Re: Unable to run sequential jobs simultaneously on the same node

[slurm-users] Re: How to exclude master from computing? Set to DRAINED?

[slurm-users] Re: sbatch: Node count specification invalid - when only specifying --ntasks

[slurm-users] Re: sbatch: Node count specification invalid - when only specifying --ntasks

[slurm-users] Re: diagnosing why interactive/non-interactive job waits are so long with State=MIXED

[slurm-users] Re: GPU GRES verification and some really broad questions.

[slurm-users] Re: [EXTERN] Re: scheduling according time requirements

[slurm-users] Re: [EXTERN] Re: scheduling according time requirements

[slurm-users] Re: scheduling according time requirements

[slurm-users] Re: Avoiding fragmentation

[slurm-users] Re: Suggestions for Partition/QoS configuration

[slurm-users] job_submit.lua - uid in Docker cluster

[slurm-users] Re: Starting a job after a file is created in previous job (dependency looking for soluton)

[slurm-users] Re: SLURM configuration for LDAP users

27 matches

Site Navigation

Mail list logo

Footer information