Re: [slurm-users] status of cloud nodes

2019-06-18 Thread nathan norton
Hi, Just tried running that command, but it only shows nodes that are up and running, doesn’t tell me about any nodes that are down and turned off, as an example please see below. There is a job running that should be using the 100 nodes but only 52 are allocated (plus 2 down* (that I know about

Re: [slurm-users] How to fix “slurmd.service: Can't open PID file” error

2019-06-18 Thread mercan
Hi; Sorry, as you can see, I did a mistake again.  I wrote two different directories: "The owner of the /var/run/slurm-llnl directory and the slurmctld.pid and slurmd.pid files should be "noki" user. chown -R noki:root /var/spool/slurm-llnl" You should run: chown -R noki:root

Re: [slurm-users] How to fix “slurmd.service: Can't open PID file” error

2019-06-18 Thread Noki Lee
Hi, slurm-users and mercan. I tried what you said. noki@noki-System-Product-Name:~$ sudo chown -R noki:root /var/spool/slurm-llnl/noki@noki-System-Product-Name:/var/spool/slurm-llnl$ ls -l total 92 -rw--- 1 noki root 198 Jun 19 11:36 assoc_mgr_state -rw--- 1 noki root 198 Jun 18 20:31

[slurm-users] Manage access to specialized nodes: Reservation, Queue, or Features

2019-06-18 Thread E.M. Dragowsky
Greetings -- We're running Slurm 17.02.2. - We have implemented OnDemand in our cluster, including the Jupyter app across all the compute nodes. The Interactive Desktop application, however, is installed on a small set of compute nodes during an extended validation period.

Re: [slurm-users] How to fix “slurmd.service: Can't open PID file” error

2019-06-18 Thread mercan
Hi; I did not notice SlurmUser=noki line. The owner of the /var/run/slurm-llnl directory and the slurmctld.pid and slurmd.pid files should be "noki" user. chown -R noki:root /var/spool/slurm-llnl Regards; Ahmet M. On 18.06.2019 15:15, mercan wrote: Hi; The owner of the

Re: [slurm-users] How to fix “slurmd.service: Can't open PID file” error

2019-06-18 Thread mercan
Hi; The owner of the /var/run/slurm-llnl directory and the slurmctld.pid and slurmd.pid files should be "slurm" user. Your files owner are root and noki. chown -R slurm:slurm /var/spool/slurm-llnl Regards; Ahmet M. On 18.06.2019 15:03, Noki Lee wrote: Though SLURM works fine for job

[slurm-users] How to fix “slurmd.service: Can't open PID file” error

2019-06-18 Thread Noki Lee
Though SLURM works fine for job submitting, running, and queueing, I got a minor error below. sudo systemctl status slurmd Jun 12 10:20:40 noki-System-Product-Name systemd[1]: slurmd.service: Can't open PID file /var/run/slurm-llnl/slurmd.pid (yet?) after start: No such file or directory sudo

Re: [slurm-users] status of cloud nodes

2019-06-18 Thread Sam Gallop (NBI)
Hi Nathan, The command I use to get the reason for failed nodes is ... 'sinfo -Ral'. If you need to extend the width of the output then ... 'sinfo -Ral -O reason:35,user,timestamp,statelong,nodelist'. Using the timestamp of the failure look in the slurmd or slurmctld logs. --- Sam Gallop

[slurm-users] status of cloud nodes

2019-06-18 Thread nathan norton
Hi all, I am using slurm with a cloud provider it is all working a treat. lets say i have 100 nodes all working fine and able to be scheduled, everything works fine. $ srun -N100 hostname works fine. For some unknown reason after machines shut down for example over the weekend if no jobs