nvidia-persistenced is something that gets installed by the nvidia driver.
Setting it to start at boot time helps with slurmd being able to find the GPUs
when it tries to start. This is just one web page that has some information
about it.
https://download.nvidia.com/XFree86/Linux-x86_64/396.
Do they show up as run away jobs?
sacctmgr show runawayjobs
If they do, it should give you the option to fix them.
Jeff
From: slurm-users On Behalf Of Reed Dier
Sent: Tuesday, December 20, 2022 9:54 AM
To: Slurm User Community List
Subject: [slurm-users] Job cancelled into the future
Hoping
Not sure if this will help. It has which user will execute the scripts
https://slurm.schedmd.com/prolog_epilog.html
Maybe the variable isn't set for the user executing the prolog/epilog/taskprolog
Jeff
From: slurm-users on behalf of Davide
DelVento
Sent: Sat
It could be because the epilog script doesn't have a PATH set by default for
security, so maybe it isn't finding the commands echo or chmod
https://slurm.schedmd.com/prolog_epilog.html
Do you have /var/slurm/etc/epilog-test on your nodes?
Jeff
From: slurm-users
In slurm.conf, we just add the Features to the node description. Is that what
you were looking for?
NodeName=compute-4-4 ... Weight=15 Feature=gen10
Jeff
UH IT - HPC
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
Hanby, Mike
Sent: Thursday, June 2, 2022 2:06 PM
Are the jobs getting assigned memory amounts that would only allow 16
processors to be used when the jobs are running on the node?
Jeff
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
Guertin, David S.
Sent: Wednesday, April 6, 2022 9:21 AM
To: slurm-users@lists.sc
They only way I found was to add the user to the new account and then modify
their default account to the new account.
sacctmgr add user example account=groupb
sacctmgr modify user where user=example set defaultaccount=groupb
Then you can remove them from the original default account if you want
Were you thinking of this
* Report current jobs that have been orphanded on the local cluster and are
now runaway:
sacctmgr show RunawayJobs
From: slurm-users on behalf of Brian
Andrus
Sent: Monday, March 1, 2021 11:14 AM
To: slurm-users@lists.schedmd.co
In our taskprolog file we have something like
#!/bin/sh
echo export SCRATCHDIR=/scratch/${SLURM_JOBID}
From: slurm-users on behalf of Herc
Silverstein
Sent: Friday, February 12, 2021 3:12 PM
To: slurm-us...@schedmd.com
Subject: [slurm-users] prolog not passin
If you run slurmd -C on the compute node, it should tell you what slurm
thinks the RealMemory number is.
Jeff
From: slurm-users on behalf of navin
srivastava
Sent: Friday, July 10, 2020 6:24 AM
To: Slurm User Community List
Subject: Re: [slurm-users] change
How do you have fabricnode2 defined in your gres.conf file and the slurm.conf
file? Since the type of gpu changed, maybe the definition for it needs to be
updated also.
Jeff
From: slurm-users on behalf of Dean
Schulze
Sent: Monday, April 27, 2020 11:47 AM
To
We have weights and priority/multifactor.
Jeff
From: Sistemas NLHPC [mailto:siste...@nlhpc.cl]
Sent: Thursday, December 05, 2019 12:01 PM
To: Sarlo, Jeffrey S; Slurm User Community List
Subject: Re: [slurm-users] Slurm configuration, Weight Parameter
Thanks Jeff !
We upgrade slurm to 18.08.4
We had a job queued waiting for resources and when we changed the debug level,
we were able to get the following in the slurmctld.log file.
[2019-08-02T10:03:47.347] debug2: JobId=804633 being held, the job is at or
exceeds assoc 50(jeff/(null)/(null)) group max tres(cpu) minutes of 3000 of
_
From: slurm-users on behalf of Sarlo,
Jeffrey S
Sent: 25 July 2019 13:04
To: Slurm User Community List
Subject: Re: [slurm-users] Slurm node weights
This is the fix if you want to modify the code and rebuild
https://github.com/SchedMD/slurm/commit/f66a2a3e2064<https://eur03.safelinks.
deploy that new
version for quite a while. In the meantime does anyone know if there any fix or
alternative strategy that might help us to achieve the same result?
Best regards,
David
From: slurm-users on behalf of Sarlo,
Jeffrey S
Sent: 25 July 2019 12:26
Which version of Slurm are you running? I know some of the earlier versions of
18.08 had a bug and node weights were not working.
Jeff
From: slurm-users on behalf of David
Baker
Sent: Thursday, July 25, 2019 6:09 AM
To: slurm-users@lists.schedmd.com
Subjec
16 matches
Mail list logo