The entry in partiton.conf:
PartitionName=CLUSTER Default=yes State=UP
nodes=gpu-[1]-[4-17],gpu-[2]-[4,6-16],gpu-[3]-[9]
causes slurmctl to crash:
2013-07-10T16:03:22.923] error: find_node_record: lookup failure for gpu-[2]-[4]
[2013-07-10T16:03:22.923] error: node_name2bitmap: invalid node specified
gpu-[2]-[4]
[2013-07-10T16:03:22.923] error: find_node_record: lookup failure for 6-16]
[2013-07-10T16:03:22.923] error: node_name2bitmap: invalid node specified 6-16]
[2013-07-10T16:03:22.923] fatal: Invalid node names in partition CLUSTER
The node does exist:
# host gpu-2-4
gpu-2-4.local has address 10.240.31.235
If I take gpu-2-4 out of the partition file then slurmclt starts and
shows
CLUSTER up infinite 23 idle
gpu-1-[5-12,14-17],gpu-2-[6-7,9-16],gpu-3-9
CLUSTER up infinite 2 down* gpu-1-13,gpu-2-8
What's the problem with slurmctl node name lookup?
What would be the correct syntax fro the partition file?
Thanks
Eva