Re: [slurm-users] How to avoid a feature?

2021-07-02 Thread Ward Poelmans
Hi Tina, On 2/07/2021 13:42, Tina Friedrich wrote: > We did think about having 'hidden' GPU partitions instead of wrangling it > with features, but there didn't seem to be any benefit to that that we could  > see. The benefit with partitions is that you can set a bunch of options that are not p

Re: [slurm-users] nodes that finished calculation do not become idle

2021-07-02 Thread Grigory Ptashko
Job array is working like magic for me. Thank you very much for the hint! > 27 июня 2021 г., в 17:40, Brian Andrus написал(а): > > I suspect you are misunderstanding how the flow works. > > You request X nodes to do some work. > You start a job that uses all the nodes. > Job runs until everythi

Re: [slurm-users] How to avoid a feature?

2021-07-02 Thread Tina Friedrich
:) That was the first thing we tried/did - however, that only works if you're cluster isn't habitually 100% busy with jobs waiting. So that didn't work very well - even with the weighting set up so that the GPU were 'last resort' (after all the special high memory nodes), they were always runn

Re: [slurm-users] How to avoid a feature?

2021-07-02 Thread Jeffrey R. Lang
How about using node weights.Weight the non-gpu nodes so that they are scheduled first. The GPU nodes could have a very high weight so that the scheduler would consider them last for allocation. This would allow the non-gpu nodes to be filled first and when full schedule the GPU nodes. Us

Re: [slurm-users] How to avoid a feature?

2021-07-02 Thread Tina Friedrich
Hi Loris, we didn't want to have too many partitions, mainly; so we were after a way to have the GPU nodes not separated out. Partly it is because we wanted to be able to easily use 'idle' CPUs on GPU nodes - although I currently only allow that on some of them (I simply also tag them with '

[slurm-users] 答复: 答复: Is there bug in PrivateData=jobs option of slurmdbd?

2021-07-02 Thread taleintervenor
Well, you got the point. We didn’t configure ldap on slurm database node. After configuring ldap authorization the PrivateData option finally worked as expected. Thanks for the assistance. 发件人: Brian Andrus 发送时间: 2021年7月1日 21:57 收件人: taleinterve...@sjtu.edu.cn 抄送: slurm-users@lists.schedmd

Re: [slurm-users] ML Training task killed(SIGKILL) when cgroup cpu limit enabled in slurm15.08

2021-07-02 Thread Jack Chen
ok, thanks for your quick response, I will find a way to upgrade it. On Fri, Jul 2, 2021 at 2:12 PM Ole Holm Nielsen wrote: > On 7/2/21 7:34 AM, Jack Chen wrote: > > Slurm is great to use, I've developed several plugins on it. Now I'm > > working on an issue in slurm. > > > > I'm using Slurm 15.