[slurm-dev] Slurm job steps are not starting, but job is masrked as running

2016-01-19 Thread Sergey Zhumatiy
Hello! I'm using slurm-2.5.6 (and I cannot upgrade for serious reasons), and today I got very strange situation: I submit a job and it becomes running in some time (squeue shows it as R), but on target nodes there are no job processes (job steps). In slurm logs on nodes I cannot see any

[slurm-dev] slurmstepd (Slurm 15.08.6)

2016-01-19 Thread Danny Rotscher
Hello, could anyone tell me, what's the reason for the following error message: slurmstepd: xcgroup: rmdir(/cgroup/freezer/slurm/uid_14822): Device or resource busy Thanks in advance. Kind regards, Danny smime.p7s Description: S/MIME Cryptographic Signature

[slurm-dev] Re: Jobs stuck in completing state

2016-01-19 Thread Christopher Samuel
On 18/01/16 21:09, Danny Rotscher wrote: > since we upgrade to Slurm 15.08 we have the problem, that so many jobs > (>2000) stuck in completing state. Is it a known problem of Slurm? We're running 15.08.x on 3 Intel clusters and a BlueGene/Q, no such issue here I'm afraid. Best of luck! Chris -

[slurm-dev] NodeName and PartitionName format in slurm.conf

2016-01-19 Thread Andrus, Brian Contractor
All, I am testing our slurm to replace our torque/moab setup here. The issue I have is to try and put all our node names in the NodeName and PartitionName entries. In our cluster, we name our nodes compute-- That seems to be problem enough with the abilities to use ranges in slurm, but it is co