away while the daemon has crashed or failed to start.
--
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority
On Fri, 2017-08-25 at 06:08 -0600, Ole Holm Nielsen wrote:
> On 08/25/2017 01:37 PM, Huijun HJ1 Ni wrote:> I installed
> slurm on my cluste
ROR
end
I'm not sure what the exact return of job_desc.gres, it may not be nil.
You'll have to test that part.
There are probably other ways to do this, but I like to use the lua
plugin in order to communicate to my users what they have done wrong.
--
Nicholas McCollum
HPC Systems Admin
e
MaxNodes submitted at job submission to 20.
if string.match(job_desc.qos, "special") then
job_desc.max_nodes = 20
end
Just a couple idea's for you, there's probably a way better way to do
it!
--
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authorit
%s.\n",
test_min_nodes)
slurm.log_user("\n%s", error_verbose)
return slurm.ERROR
end
--
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority
On Wed, 2017-06-28 at 13:51 -0600, Nathan Vance wrote:
> Correction (copy/pasted wrong thing): It was the
> &quo
nJob must request a QoS using
the --qos= flag.\n",asc_error_verbose)
asc_qos = "invalid"
end
I'd be more than happy to share my job_submit.lua if anyone is
interested. I only ask that you share yours back.
--
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Autho
I'm about to update 15.08 to the latest SLURM in August and would appreciate
any notes you have on the process.
I'm especially interested in maintaining the DB as well as associations. I'd
also like to keep the pending job list if possible.
I've only got around 100,000 jobs in the DB so far, s
Set FirstJobId in your slurm.conf
FirstJobId=12345
--
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority
On Mon, 2017-04-10 at 12:45 -0700, Edward Walter wrote:
> Hi All,
>
> We recently experienced a RAID failure on one of our clusters
> running
>
ou to Ryan Cox for these excellent tools.
--
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority
On Fri, 2017-03-17 at 08:59 -0700, Ryan Cox wrote:
> usage_in_bytes is not actually usage in bytes, by the way. It's
> often close but I have seen wildly differe
Hossein,
Try:
sacct -a --
format=submit,start,partition,timeLimit,elapsed,TotalCPU,ReqMem,MaxRSS,
AllocCPUS,job,state -X
Note the -X flag.
-X, --allocations: Only show cumulative statistics for each job, not
the intermediate steps.
--
Nicholas McCollum
HPC Systems Administrator
Alabama
for all. I think in the future I will edit my
job_submit.lua script and wait for all the jobs that have ran through
it to finish before removing partitions.
My question for the group is, other than the above mentioned method, is
there something I could have done differently to prevent SLURM f
have any users on it.
I'm sure someone has already blazed this trail before, but this is how
I am going about it.
--
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority
On Thu, 2017-02-09 at 07:32 -0800, Ryan Cox wrote:
> John,
>
> We use /etc/security/li
Have you checked to make sure your GPU's are in persistence mode?
http://docs.nvidia.com/deploy/driver-persistence/
# nvidia-smi --persistence-mode=1
---
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority
On Tue, 6 Dec 2016, David van Leeuwen
: 87%
User: user6
---
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority
On Mon, 19 Sep 2016, Ryan Cox wrote:
I should probably add some example output:
Someone we need to talk to:
Node | Memory (GB) | CPUs
Hostname Alloc
uster that integrates well
with slurm, I would love to hear from you.
Thanks!
---
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority
smime.p7s
Description: S/MIME Cryptographic Signature
inters. I'm not an expert in
this, but I feel like this plugin could use better documentation as it is quite
flexible and powerful.
---
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority
On Tue, 19 Jul 2016, Yong Qin wrote:
Hi,
I'm trying to
of memory so you might want to double check, but
this is how I would do it.
---
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority
On Wed, 6 Jul 2016, Benjamin Redling wrote:
Hi,
On 07/06/2016 11:17, Laurent Facq wrote:
i would like to use only one partition with the 80 nodes,
orrect the slurmctld will crash. If here is an error, an easy
way to figure it out is to do a 'slurmctld -Dv' and it will fail and
tell you what the issue is.
Hopefully this helps.
---
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority
On
some consideration I feel I might be able to set up something with a
prolog script, which I will test tomorrow.
Thanks!
---
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority
On Sat, 4 Jun 2016, Pär Lindfors wrote:
On 06/03/2016 08:28 PM, Nicholas
n all submitted jobs. I've tried using /etc/sysconfig/slurm and
it appears this file is ignored. I would even be happy if this is
something that I could set in the job_submit.lua plugin, but I have not
seen a variable for something like this.
Any ideas?
---
Nicholas Mc
n all submitted jobs. I've tried using /etc/sysconfig/slurm and
it appears this file is ignored. I would even be happy if this is
something that I could set in the job_submit.lua plugin, but I have not
seen a variable for something like this.
Any ideas?
---
Nicholas McCollum
H
20 matches
Mail list logo