[slurm-dev] Re: Slurm Upgrade from 14.11.3 to 16.5.3 - Instructions needed

2016-08-18 Thread Christopher Samuel
On 18/08/16 20:32, Ole Holm Nielsen wrote: > Chris Samuel in a previous posting had some more cautious advice about > upgrading slurmd daemons! I hope that Chris may offer addition insights. It's just that if I don't have to upgrade nodes running jobs I'd really rather avoid it. I know it's su

[slurm-dev] Re: Slurm Upgrade from 14.11.3 to 16.5.3 - Instructions needed

2016-08-18 Thread Christopher Samuel
On 18/08/16 21:07, Barbara Krasovec wrote: > scontrol reconfigure "scontrol reconfigure" will do most, but not all parameters. For instance adding/removing nodes requires daemon restarts. cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences

[slurm-dev] Re: Slurm Upgrade from 14.11.3 to 16.5.3 - Instructions needed

2016-08-18 Thread Christopher Samuel
On 17/08/16 23:36, Ole Holm Nielsen wrote: > Obviously upgrading slurmd's which are running jobs is quite tricky! I > have some questions: > > 1. Can't you replace the health check by a global scontrol like this? >scontrol update NodeName= State=drain Reason="Upgrading > slurmd" Yes you co

[slurm-dev] Re: Slurm Upgrade from 14.11.3 to 16.5.3 - Instructions needed

2016-08-18 Thread Christopher Samuel
On 17/08/16 04:42, Ole Holm Nielsen wrote: > Question: Can anyone provide slurmdbd upgrade instructions which work > correctly on CentOS 7 (and other OSes using systemd)? We don't tend to start slurmdbd via an init script when doing an upgrade, instead we run it by hand adding "-D -v -v" so it d

[slurm-dev] Preemption stats

2016-08-18 Thread Jeff White
Does Slurm have a way of showing historical data of how many jobs have been preempted by other jobs? -- Jeff White HPC Systems Engineer Information Technology Services - WSU

[slurm-dev] Re: Remote Visualization and Slurm

2016-08-18 Thread Andrew Elwell
> If anyone has a working remote visualization cluster that integrates well > with slurm, I would love to hear from you. We're using 'strudel' https://www.massive.org.au/userguide/cluster-instructions/strudel and our local instructions are https://support.pawsey.org.au/documentation/display/US/Ge

[slurm-dev] Re: Slurm Upgrade from 14.11.3 to 16.5.3 - Instructions needed

2016-08-18 Thread Barbara Krasovec
Well, if you're doing the upgrade of already installed packages, you can do: yum update slurm-sql slurm-munge slurm-slurmdbd or rpm -Uvh (the U switch is for upgrade of already installed packages) Cheers, Barbara On 18/08/16 19:50, Balaji Deivam wrote: Re: [slurm-dev] Re: Slurm Upgrade f

[slurm-dev] Re: Slurm Upgrade from 14.11.3 to 16.5.3 - Instructions needed

2016-08-18 Thread Balaji Deivam
Thanks for your response. I have build the RPMs and got below files generated. Then installed those 3 rpms alone which you have mentioned. Is this right? -rw-r- 1 root root 25680316 Aug 18 12:44 slurm-15.08.12-1.el6.x86_64.rpm -rw-r- 1 root root 451160 Aug 18 12:44 slurm-perlapi-15.08.1

[slurm-dev] Re: jobs rejected for no reason

2016-08-18 Thread Ade Fewings
Hi Antonia Hmmwe set the Default Memory Per CPU for the partitions as well as having the node memory populated, so I may be unable to help here, but I'm curious as to why your node reports "RealMemory=1" - but I don't know if that is 'normal' in some case, to be honest. If I request an impo

[slurm-dev] Re: jobs rejected for no reason

2016-08-18 Thread Antonia Mey
Hi Ade, looking at the node configuration this should be ok: NodeName=node011 Arch=x86_64 CoresPerSocket=1 CPUAlloc=4 CPUErr=0 CPUTot=32 CPULoad=2.79 Features=(null) Gres=gpu:4 NodeAddr=node011 NodeHostName=node011 Version=14.11 OS=Linux RealMemory=1 AllocMem=0 Sockets=32 Boards=1

[slurm-dev] Re: jobs rejected for no reason

2016-08-18 Thread Ade Fewings
Hi Antonia I think it's quite likely to be something to do with the nodes in that partition, does 'scontrol show node=' show all the requested capabilities? ~~ Ade From: Antonia Mey Sent: 18 August 2016 15:55:12 To: slurm-dev Subject: [slurm-dev] jobs rejec

[slurm-dev] jobs rejected for no reason

2016-08-18 Thread Antonia Mey
Hi all, I am a bit out of my depth here and apologies if this is a very trivial problem. Slurm rejects jobs due to insufficient resources (sbatch: error: Batch job submission failed: Requested node configuration is not available), when the partition should definitely accept the following job #

[slurm-dev] Re: SLURM job's email notification does not work

2016-08-18 Thread Fatih Öztürk
Dear Doug and Christian, Thank you very very much because of your help. I could be able to solve the problem when i make the changes as you said (Applied in headnode then scontrol reconfigure). Now the emails are sent successfully. Best Regards, Fatih [cid:image67d4fe.PNG@2243a42f.4cbb915

[slurm-dev] Re: SLURM job's email notification does not work

2016-08-18 Thread Douglas Jacobsen
Email is only sent by slurmctld, you'll need to change slurm.conf there and at least do an `scontrol reconfigure`, then perhaps it'll start working. -Doug Doug Jacobsen, Ph.D. NERSC Computer Systems Engineer National Energy Research Scientific Computing Center ---

[slurm-dev] Re: SLURM job's email notification does not work

2016-08-18 Thread Fatih Öztürk
Dear Christian, What i did now; 1) Changed computenode3 status to DRAIN 2) Changed slurm.conf only on computenode3. Added MailProg=/usr/bin/mailx at the end of the slurm.conf 3)Restarted munged and slurmd services 4) Changed computenode3 status to RESUME Both with root and my own user id, email

[slurm-dev] Re: SLURM job's email notification does not work

2016-08-18 Thread Christian Goll
Hello Fatih, did you set the variable MailProg to right value, e.g. MailProg=/usr/bin/mailx in slurm.conf? kind regards, Christian On 18.08.2016 14:44, Fatih Öztürk wrote: > Hello, > > > I have a problem about email notification with jobs. I would be > appreciate if you could help me. > > > We ha

[slurm-dev] SLURM job's email notification does not work

2016-08-18 Thread Fatih Öztürk
Hello, I have a problem about email notification with jobs. I would be appreciate if you could help me. We have a SLURM cluster: 1 Head Node and about 20 Compute Nodes. User's run their jobs within only on head node with their own credentials. As an example, if i run a job like below on the

[slurm-dev] Re: Slurm Upgrade from 14.11.3 to 16.5.3 - Instructions needed

2016-08-18 Thread Barbara Krasovec
Hello! On 18/08/16 12:33, Ole Holm Nielsen wrote: On 08/17/2016 03:49 PM, Barbara Krasovec wrote: I upgraded SLURM rom 15.08 to 16.05 without draining the nodes and without loosing any jobs, this was my procedure: I increased timeouts in slurm.conf: SlurmctldTimeout=3600 SlurmdTimeout=3600

[slurm-dev] Re: Slurm Upgrade from 14.11.3 to 16.5.3 - Instructions needed

2016-08-18 Thread Barbara Krasovec
Helllo! On 17/08/16 23:59, Balaji Deivam wrote: Re: [slurm-dev] Re: Slurm Upgrade from 14.11.3 to 16.5.3 - Instructions needed Hi, Can someone give me the detailed step on "Upgrade the slurmdbd daemon" ? I have downloaded the Slrum source tar file and looking for how to upgrade o

[slurm-dev] Re: Slurm Upgrade from 14.11.3 to 16.5.3 - Instructions needed

2016-08-18 Thread Ole Holm Nielsen
On 08/17/2016 03:49 PM, Barbara Krasovec wrote: I upgraded SLURM rom 15.08 to 16.05 without draining the nodes and without loosing any jobs, this was my procedure: I increased timeouts in slurm.conf: SlurmctldTimeout=3600 SlurmdTimeout=3600 Question: When you change parameters in slurm.conf