Hello,
I am testing slurm 2.5-rc2 in a cluster with PS702 nodes (2 processors Power7 with 8 cores each one and SMT disable) and I have a problem with the task/affinity plugin when it set masks.
When a job request 4 or less tasks, masks are setting up rightly, however when it request 5 processors or more, slurm returns an error setting masks: cpu_bind=MASK - r09c3b3, task 5 5 [61232]: mask 0x1111111111111111 set FAILED
What could be causing this problem? Can be a configuration error? a slurm bug? CPU IDs are not consecutive, because these nodes allow enable and disable multithreading without reboot the node.
I attached my configuration and logs. Thanks a lot! *********** SLURM.CONF: *********** TaskPlugin=task/affinity TaskPluginParam=Cores,Verbose *********** JOB.OUTPUT: *********** cpu_bind=MASK - r09c3b3, task 0 0 [61214]: mask 0xffff set cpu_bind=MASK - r09c3b3, task 2 2 [61229]: mask 0x100 set cpu_bind=MASK - r09c3b3, task 1 1 [61228]: mask 0x10 set cpu_bind=MASK - r09c3b3, task 0 0 [61227]: mask 0x1 set cpu_bind=MASK - r09c3b3, task 3 3 [61230]: mask 0x1000 setcpu_bind=MASK - r09c3b3, task 4 4 [61231]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setupcpu_bind=MASK - r09c3b3, task 5 5 [61232]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setupcpu_bind=MASK - r09c3b3, task 6 6 [61233]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setupcpu_bind=MASK - r09c3b3, task 7 7 [61234]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setupcpu_bind=MASK - r09c3b3, task 9 9 [61236]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setupcpu_bind=MASK - r09c3b3, task 8 8 [61235]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setupcpu_bind=MASK - r09c3b3, task 11 11 [61238]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setupcpu_bind=MASK - r09c3b3, task 13 13 [61240]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setupcpu_bind=MASK - r09c3b3, task 10 10 [61237]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setupcpu_bind=MASK - r09c3b3, task 15 15 [61242]: mask 0x1111111111111111 set FAILED cpu_bind=MASK - r09c3b3, task 14 14 [61241]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setup slurmd[r09c3b3]: Failed task affinity setupcpu_bind=MASK - r09c3b3, task 12 12 [61239]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setup srun: error: r09c3b3: tasks 4-15: Exited with exit code 1 ********** SLURM.LOG: ********** [2012-12-05T12:07:35+00:00] task_slurmd_batch_request: 430973[2012-12-05T12:07:35+00:00] task/affinity: job 430973 CPU input mask for node: 0xFFFF [2012-12-05T12:07:35+00:00] task/affinity: job 430973 CPU final HW mask for node: 0xFFFF
[2012-12-05T12:07:35+00:00] Launching batch job 430973 for UID 50158 [2012-12-05T12:07:35+00:00] [430973] Using sched_affinity for tasks[2012-12-05T12:07:35+00:00] task affinity : enforcing 'verbose,cores' cpu bind method [2012-12-05T12:07:35+00:00] lllp_distribution jobid [430973] binding: verbose,cores, dist 2
[2012-12-05T12:07:35+00:00] _task_layout_lllp_cyclic[2012-12-05T12:07:35+00:00] _lllp_generate_cpu_bind jobid [430973]: verbose,mask_cpu, 0x0001,0x0010,0x0100,0x1000,0x0002,0x0004,0x0008,0x0020,0x0040,0x0080,0x0200,0x0400,0x0800,0x2000,0x4000,0x8000 [2012-12-05T12:07:35+00:00] launch task 430973.0 request from [email protected] (port 39175)
[2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks [2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks [2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks [2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks [2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup [2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks [2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks [2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup [2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks [2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks [2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup [2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup [2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks [2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup [2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks [2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup [2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks [2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup [2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks [2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup [2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks [2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup [2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks [2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks [2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks [2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup [2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup [2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup [2012-12-05T12:07:36+00:00] [430973.0] done with job[2012-12-05T12:07:36+00:00] [430973] sending REQUEST_COMPLETE_BATCH_SCRIPT, error:0
[2012-12-05T12:07:36+00:00] [430973] done with job
smime.p7s
Description: Firma criptográfica S/MIME
