Hi Eva!

As Sergio said, you have to specify the compute nodes with "NodeName=..." and then define 
partitions including those cnodes with "PartitionName=... Nodes=.." without including the head 
nodes or the login nodes. Also you could set in slurm.conf file the parameter "AllocNodes=..." 
where usually we give the login nodes only, in order to disable submission on any nodes other than the login 
nodes.

So my question now, is node hpcdev-005.sdsc.edu a login node or a master/admin 
node. I mean from that node or from a different one did you do the submission? 
Because if this one is the login node then there is no error at all, this is 
the default behaviour of salloc.

The default salloc (you can change it) returns you a shell on the node where 
the submission took place and then with srun commands you can execute programs 
on the compute nodes.

So in case you want an interactive shell on the compute nodes then you should 
execute:

"salloc -N1 -p active srun -N1 --pty sh"

or directly an srun command (without salloc involved):

"srun -N1 -p active --pty sh"

Best Regards,
Chrysovalantis Paschoulas



On 09/12/2014 09:19 AM, Sergio Iserte wrote:
Hello Eva,
you must remove the management nodes from the field "Nodes" of the 
"PartitionName" parameter.

With the slurm.conf file would be easier to write an example, anyway this 
should work!

Regards,
Sergio.

2014-09-12 9:06 GMT+02:00 Uwe Sauter 
<uwe.sauter...@gmail.com<mailto:uwe.sauter...@gmail.com>>:

Hi Eva,

if you don't want to use the controller node for jobs, the easiest way
is to not configure it as node at all. Meaning you don't need a line like

NodeName=hpc-0-5 RealMemory=....

for the controller.


A program/user can find out which nodes are allocated by looking into
the environment variables. Try running salloc and then

$ env | grep SLURM

Here is an example output:

SLURM_NODELIST=n523601
SLURM_NODE_ALIASES=(null)
SLURM_NNODES=1
SLURM_JOBID=6437
SLURM_TASKS_PER_NODE=40
SLURM_JOB_ID=6437
SLURM_SUBMIT_DIR=/nfs/admins/adm17
SLURM_JOB_NODELIST=n523601
SLURM_JOB_CPUS_PER_NODE=40
SLURM_SUBMIT_HOST=frontend
SLURM_JOB_PARTITION=foo
SLURM_JOB_NUM_NODES=1



Regards,

       Uwe



Am 12.09.2014 um 00:45 schrieb Eva Hocks:



I am trying to configure the latest slurm 14.03 and am running into
problem to prevent slurm from running jobs on the control node.

sinfo shows 3 nodes configure in the slurm.conf:
active       up    2:00:00      1  down* hpc-0-5
active       up    2:00:00      1    mix hpc-0-4
active       up    2:00:00      1   idle hpc-0-6


but when I use salloc I end up on the head node


$ salloc -N 1 -p active sh
salloc: Granted job allocation 16
sh-4.1$ hostname
hpcdev-005.sdsc.edu<http://hpcdev-005.sdsc.edu>


That node is not part of the "active" partition but slurm still uses it.
How? The allocation btw is for  NodeList=hpc-0-4
and the user can login to that node without a problem but slurm doesn't
run the sh on that node for the user.

Also how can a user find out what nodes are allocated without having to
run the scontrol command? Is there an option in salloc to return the
host names?

Thanks
Eva




--
Sergio Iserte Agut, research assistant,
High Performance Computing & Architecture
Jaume I University (Castellón, Spain)





------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

Reply via email to