Hi Eva! As Sergio said, you have to specify the compute nodes with "NodeName=..." and then define partitions including those cnodes with "PartitionName=... Nodes=.." without including the head nodes or the login nodes. Also you could set in slurm.conf file the parameter "AllocNodes=..." where usually we give the login nodes only, in order to disable submission on any nodes other than the login nodes.
So my question now, is node hpcdev-005.sdsc.edu a login node or a master/admin node. I mean from that node or from a different one did you do the submission? Because if this one is the login node then there is no error at all, this is the default behaviour of salloc. The default salloc (you can change it) returns you a shell on the node where the submission took place and then with srun commands you can execute programs on the compute nodes. So in case you want an interactive shell on the compute nodes then you should execute: "salloc -N1 -p active srun -N1 --pty sh" or directly an srun command (without salloc involved): "srun -N1 -p active --pty sh" Best Regards, Chrysovalantis Paschoulas On 09/12/2014 09:19 AM, Sergio Iserte wrote: Hello Eva, you must remove the management nodes from the field "Nodes" of the "PartitionName" parameter. With the slurm.conf file would be easier to write an example, anyway this should work! Regards, Sergio. 2014-09-12 9:06 GMT+02:00 Uwe Sauter <uwe.sauter...@gmail.com<mailto:uwe.sauter...@gmail.com>>: Hi Eva, if you don't want to use the controller node for jobs, the easiest way is to not configure it as node at all. Meaning you don't need a line like NodeName=hpc-0-5 RealMemory=.... for the controller. A program/user can find out which nodes are allocated by looking into the environment variables. Try running salloc and then $ env | grep SLURM Here is an example output: SLURM_NODELIST=n523601 SLURM_NODE_ALIASES=(null) SLURM_NNODES=1 SLURM_JOBID=6437 SLURM_TASKS_PER_NODE=40 SLURM_JOB_ID=6437 SLURM_SUBMIT_DIR=/nfs/admins/adm17 SLURM_JOB_NODELIST=n523601 SLURM_JOB_CPUS_PER_NODE=40 SLURM_SUBMIT_HOST=frontend SLURM_JOB_PARTITION=foo SLURM_JOB_NUM_NODES=1 Regards, Uwe Am 12.09.2014 um 00:45 schrieb Eva Hocks:
I am trying to configure the latest slurm 14.03 and am running into problem to prevent slurm from running jobs on the control node. sinfo shows 3 nodes configure in the slurm.conf: active up 2:00:00 1 down* hpc-0-5 active up 2:00:00 1 mix hpc-0-4 active up 2:00:00 1 idle hpc-0-6 but when I use salloc I end up on the head node $ salloc -N 1 -p active sh salloc: Granted job allocation 16 sh-4.1$ hostname hpcdev-005.sdsc.edu<http://hpcdev-005.sdsc.edu> That node is not part of the "active" partition but slurm still uses it. How? The allocation btw is for NodeList=hpc-0-4 and the user can login to that node without a problem but slurm doesn't run the sh on that node for the user. Also how can a user find out what nodes are allocated without having to run the scontrol command? Is there an option in salloc to return the host names? Thanks Eva
-- Sergio Iserte Agut, research assistant, High Performance Computing & Architecture Jaume I University (Castellón, Spain) ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------