[slurm-users] Cannot enable Gang scheduling

2023-01-12 Thread Helder Daniel
Hi, I am trying to enable gang scheduling on a server with a CPU with 32 cores and 4 GPUs. However, using Gang sched, the cpu jobs (or gpu jobs) are not being preempted after the time slice, which is set to 30 secs. Below is a snapshot of squeue. There are 3 jobs each needing 32 cores. The first

Re: [slurm-users] Cannot enable Gang scheduling

2023-01-13 Thread Helder Daniel
MemPerNode=UNLIMITED On Fri, 13 Jan 2023 at 11:16, Kevin Broch wrote: > Problem might be that OverSubscribe is not enabled? w/o it, I don't > believe the time-slicing can be GANG scheduled > > Can you do a "scontrol show partition" to verify that it is? > > On Thu, J

Re: [slurm-users] Cannot enable Gang scheduling

2023-01-13 Thread Helder Daniel
/usr/lib/xorg/Xorg 4MiB | |3 N/A N/A524226 C /bin/python 15362MiB | +-+ On Fri, 13 Jan 2023 at 12:08, Helder Daniel wrote: > Hi Kevin > > I did a "scontrol show partition". >

Re: [slurm-users] Cannot enable Gang scheduling

2023-01-13 Thread Helder Daniel
ble with GANG,SUSPEND. GPU memory isn't > managed in Slurm so the idea of suspending GPU memory for another job to > use the rest simply isn't possible. > > On Fri, Jan 13, 2023 at 4:08 AM Helder Daniel wrote: > >> Hi Kevin >> >> I did a "scontrol

Re: [slurm-users] Cannot enable Gang scheduling

2023-01-13 Thread Helder Daniel
lly others in the group have some > ideas/explanations. I haven't had to deal with GPU resources in Slurm. > > On Fri, Jan 13, 2023 at 4:51 AM Helder Daniel wrote: > >> Oh, ok. >> I guess I was expecting that the GPU job was suspended copying GPU memory >> to R