Re: [slurm-users] [ext] Enforce gpu usage limits (with GRES?)

2023-02-02 Thread Holtgrewe, Manuel
Hi, if by "share the GPU" you mean exclusive allocation to a single job then, I believe, you are missing cgroup configuration for isolating access to the GPU. Below the relevant parts (I believe) of our configuration. There also is a way of time- and space-slice GPUs but I guess you should

Re: [slurm-users] [ext] Access reservation list from job_submit.lua?

2021-09-01 Thread Holtgrewe, Manuel
reservations) do slurm.log_user("i = %s", inspect(a)) slurm.log_user(inspect(b.start_time)) slurm.log_user("--") end might print srun: i = "root_1" srun: 1630598400 srun: -- Maybe that's useful for someone. Best wishes, Manuel

[slurm-users] Access reservation list from job_submit.lua?

2021-09-01 Thread Holtgrewe, Manuel
Hi, I have trouble finding examples for job_submit.lua code that uses the global Slurm reservations. I want to adjust the maximal running time of jobs with respect to the closest reservation flagged as "maintenance". However, I cannot find any job_submit.lua coda that accesses the Slurm

[slurm-users] Disjoint partitions & jobs stuck in JobState=PENDING Reason=Priority

2021-03-23 Thread Holtgrewe, Manuel
Dear all, I'm using the slurm.conf file from the attachment. I have some partitions, e.g., "long", that apply to the same list of nodes but that are disjoint to the partition "highmem". "highmem" only allows a user to use one node at a time. I have a user who has submitted many jobs to

Re: [slurm-users] [ext] Re: Jobs getting StartTime 3 days in the future?

2020-08-31 Thread Holtgrewe, Manuel
it’ll get pushed back like you see here. On Aug 31, 2020, at 12:13 PM, Holtgrewe, Manuel wrote: Dear all, I'm seeing some user's job getting a StartTime 3 days in the future although there are plenty of resources available in the the partition (and the user is well below maxTRESPU of the

[slurm-users] Jobs getting StartTime 3 days in the future?

2020-08-31 Thread Holtgrewe, Manuel
Dear all, I'm seeing some user's job getting a StartTime 3 days in the future although there are plenty of resources available in the the partition (and the user is well below maxTRESPU of the partition). Attached is our slurm.conf and the dump of "sacctmgr list qos -P". I'd be grateful for

[slurm-users] Limiting number of CPUs per user

2020-08-20 Thread Holtgrewe, Manuel
Dear all, I have a Slurm setup that has a couple of partitions and I want to limit the number of CPUs that each user can use in each partition. I'm trying to do this by attaching a qos to each partition and setting "MaxTRESPerUser" in each to, e.g., cpu=1000. It looks like this setting is

[slurm-users] Oversubscribe until 100% load?

2020-06-11 Thread Holtgrewe, Manuel
Hi, I have some trouble understanding the "Oversubscribe" setting completely. What I would like is to oversubscribe nodes to increase overall throughput. - Is there a way to oversubscribe by a certain fraction, e.g. +20% or +50%? - Is there a way to stop if a node reaches 100% "Load"? Is there

Re: [slurm-users] [ext] Re: Make "srun --pty bash -i" always schedule immediately

2020-06-11 Thread Holtgrewe, Manuel
gt;> >>> Generally the way we've solved this is to set aside a specific set of >>> nodes in a partition for interactive sessions. We deliberately scale >>> the size of the resources so that users will always run immediately and >>> we also set a QoS on the partiti

[slurm-users] Make "srun --pty bash -i" always schedule immediately

2020-06-11 Thread Holtgrewe, Manuel
Hi, is there a way to make interactive logins where users will use almost no resources "always succeed"? In most of these interactive sessions, users will have mostly idle shells running and do some batch job submissions. Is there a way to allocate "infinite virtual cpus" on each node that

Re: [slurm-users] [ext] Re: [External] Defining a default --nodes=1

2020-05-10 Thread Holtgrewe, Manuel
amongst multiple nodes. Mike From: slurm-users on behalf of "Holtgrewe, Manuel" Reply-To: Slurm User Community List Date: Friday, May 8, 2020 at 03:28 To: "slurm-users@lists.schedmd.com" Subject: [External] [slurm-users] Defining a default --nodes=1 CAUTION: This email ori

[slurm-users] Defining a default --nodes=1

2020-05-08 Thread Holtgrewe, Manuel
Dear all, we're running a cluster where the large majority of jobs will use multi-threading and no message passing. Sometimes CPU>1 jobs are scheduled to run on more than one node (which would be fine for MPI jobs of course...) Is it possible to automatically set "--nodes=1" for all jobs

[slurm-users] How to display the effective QOS of a job?

2020-05-08 Thread Holtgrewe, Manuel
Hi, is it possible to display the effective QOS of a job? I need to investigate some unexpected behaviour of the scheduler (at least unexpected to me at the moment). I want to limit maximum number of CPUs per user in each partition. It is my understanding from the documentation that partition

[slurm-users] Reading which GPUs were assigned to which job

2020-04-23 Thread Holtgrewe, Manuel
Dear all, is it possible to find out which GPU was assigned to which job through squeue or sacct? My motivation is as follows: some users write jobs with bad resource usage (e.g., 1h CPU to precompute, followed by 1h GPU to process, and so on). I don't care so much about CPUs at the moment as

[slurm-users] Limiting number of *cores* used by each user in a partition

2020-04-06 Thread Holtgrewe, Manuel
Dear all, I would like to limit the number of cores used in by each user in a partition. Is this possible? Thanks, -- Dr. Manuel Holtgrewe, Dipl.-Inform. Bioinformatician Core Unit Bioinformatics – CUBI Berlin Institute of Health / Max Delbrück Center for Molecular Medicine in the Helmholtz