The entry in partiton.conf:
PartitionName=CLUSTER Default=yes State=UP 
nodes=gpu-[1]-[4-17],gpu-[2]-[4,6-16],gpu-[3]-[9]


causes slurmctl to crash:

2013-07-10T16:03:22.923] error: find_node_record: lookup failure for gpu-[2]-[4]
[2013-07-10T16:03:22.923] error: node_name2bitmap: invalid node specified 
gpu-[2]-[4]
[2013-07-10T16:03:22.923] error: find_node_record: lookup failure for 6-16]
[2013-07-10T16:03:22.923] error: node_name2bitmap: invalid node specified 6-16]
[2013-07-10T16:03:22.923] fatal: Invalid node names in partition CLUSTER



The node does exist:
#  host gpu-2-4
gpu-2-4.local has address 10.240.31.235


If I take gpu-2-4 out of the partition file then slurmclt starts and
shows

CLUSTER      up   infinite     23   idle 
gpu-1-[5-12,14-17],gpu-2-[6-7,9-16],gpu-3-9
CLUSTER      up   infinite      2  down* gpu-1-13,gpu-2-8

What's the problem with slurmctl node name lookup?
What would be the correct syntax fro the partition file?

Thanks
Eva

Reply via email to