[slurm-dev] OFFLOAD_DEVICES not set during prolog, how to find which MICs are allocated then?

2013-08-21 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi folks, Our Xeon Phi nodes are all diskless and sadly Intel's libraries hard code the path /tmp/coi_procs as a place to store files to push to the Xeon Phi cards. This means after a few processes /tmp gets full of copies of MKL, etc. :-( We had w

[slurm-dev] Slurm 2.6.0 MIC GRES support does not set OFFLOAD_DEVICES=-1 if no GRES requested

2013-08-21 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi folks, I'm trying to set up SLURM 2.6.0 GRES support for MIC cards and I've found that whilst the documentation says: http://slurm.schedmd.com/gres.html # If no MICs are reserved via GRES, the OFFLOAD_DEVICES # variable is set to -1. that doesn

[slurm-dev] How does sacct honor the "-S" and "-E" option?

2013-08-21 Thread Yong Qin
Hi, This has been puzzling me for a while. So I'm hoping somebody can clarify it for me. In short, when I use "sacct -S $T1 -E $T2" I often get lots of jobs that are completely out of the range of ($T1, $T2). For example, $ sacct -a -S 2013-05-11T00:00:00 -E 2013-05-12T00:00:00 -o jobid,start,end

[slurm-dev] Re: Fwd: Slurm Question

2013-08-21 Thread Taras Shapovalov
Hi Matthew, We have not seen this error before. I suggest to stop slurmd on dena1 and start it in a terminal with debug messages and system calls traces (you will see what files it opens and hopefully some details about errors): strace /path/to/slurmd -D -vvv then submit a job to dena1 again.

[slurm-dev] Fwd: Slurm Question

2013-08-21 Thread Matthew Russell
Hi, I upgraded slurm on my Bright 5.2 Cluster from 2.2.7 o 2.4.2. Recently I've been having issue running slurm processes though. I've read some postings ( like this) but I s

[slurm-dev] Re: Required node not available (down or drained)

2013-08-21 Thread Nikita Burtsev
slurmctld is the management process and since your have access to squeue/sinfo information it is running just fine. You need to check if slurmd (which is the agent part) is running on your nodes, i.e. VM-[669-671] -- Nikita Burtsev On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nand

[slurm-dev] Re: Required node not available (down or drained)

2013-08-21 Thread Sivasangari Nandy
I have tried : /etc/init.d/slurm-llnl start [ ok ] Starting slurm central management daemon: slurmctld. /usr/sbin/slurmctld already running. And : scontrol show slurmd scontrol: error: slurm_slurmd_info: Connection refused slurm_load_slurmd_status: Connection refused Hum how to procee

[slurm-dev] Re: Required node not available (down or drained)

2013-08-21 Thread Danny Auble
Check your slurmd log. It doesn't appear the slurmd is running. Sivasangari Nandy wrote: >> > Hello, >> > >> > I'm trying to use Slurm for the first time, and I got a problem >> > with >> > nodes I think. >> >> > I have this message when I used squeue : >> > >> > root@VM-667:~# squeue >> >>

[slurm-dev] Required node not available (down or drained)

2013-08-21 Thread Sivasangari Nandy
> > Hello, > > > I'm trying to use Slurm for the first time, and I got a problem > > with > > nodes I think. > > > I have this message when I used squeue : > > > root@VM-667:~# squeue > > > JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) > > > 50 SLURM-deb test.sh root PD 0:00 1 (Re