I think it's related to the job step launch semantic change introduced at
20.11.0, which has been reverted since 20.11.3, see
https://www.schedmd.com/news.php For details.
Cheers,
Angelos
(Sent from mobile, please pardon me for typos and cursoriness.)
> 26/2/2021 9:07、Volker Blum のメール:
>
> H
Hi Jianwen,
I guess the -p or -P flag does what you want?
Best regards,
Angelos
(Sent from mobile, please pardon me for typos and cursoriness.)
> 9/2/2021 21:46、SJTU のメール:
>
> Hi,
>
>I am using SLURM 19.05.7 . Is it possible to insert user-defined
> separating characters like "|" or ","
I have some logic of making sure that the node to be acted on is in idle state
in SuspendProgram and its helper programs, before power action is performed.
Best regards,
Angelos
(Sent from mobile, please pardon me for typos and cursoriness.)
> 2020/08/24 17:42、Jacek Budzowski のメール:
>
>
> Dear
Agreed. You may also want to write a script that gather the list of program in
"D state" (kernel wait) and print their stack; and configure it as
UnkillableStepProgram so that you can capture the program and relevant system
callS that caused the job to become unkillable / timed out exiting for f
If it's Ethernet problem there should be kernel message (dmesg) showing either
link/carrier change or driver reset?
OP's problem could have been caused by excessive paging, check the -M flag of
slurmd? https://slurm.schedmd.com/slurmd.html
Regards,
Angelos
(Sent from mobile, please pardon me fo
Hi Timo,
We have faced similar problem and our solution was to run an hourly cron job to
set a random node weight for each node. It works pretty well for us.
Best regards,
Angelos
(Sent from mobile, please pardon me for typos and cursoriness.)
> 2020/07/03 2:24、Timo Rothenpieler のメール:
>
> Hel
Hi Gizo,
I noticed SLURM_CONF was set to a broken socket when inside salloc,
that's why sinfo was confused.
I've found a workaround that if I "unset SLURM_CONF" before sinfo, then
sinfo works.
Maybe a bug needs to be reported for this.
Best regards,
Angelos
On 3/4/20 2:07 AM, nan...@luis.un
Hi all,
Looks like using --config-server limits to 1 config server if I'm not
mistaken?
Specifying multiple --config-server will cause slurmd to consider only
the last one.
(A quick glance at the source seems to agree)
Any plan on accepting a second server via command line options?
Thanks & r