Hello,

I am testing slurm 2.5-rc2 in a cluster with PS702 nodes (2 processors Power7 with 8 cores each one and SMT disable) and I have a problem with the task/affinity plugin when it set masks.

When a job request 4 or less tasks, masks are setting up rightly, however when it request 5 processors or more, slurm returns an error setting masks: cpu_bind=MASK - r09c3b3, task 5 5 [61232]: mask 0x1111111111111111 set FAILED

What could be causing this problem? Can be a configuration error? a slurm bug? CPU IDs are not consecutive, because these nodes allow enable and disable multithreading without reboot the node.

I attached my configuration and logs.

Thanks a lot!

***********
SLURM.CONF:
***********
TaskPlugin=task/affinity
TaskPluginParam=Cores,Verbose

***********
JOB.OUTPUT:
***********
cpu_bind=MASK - r09c3b3, task  0  0 [61214]: mask 0xffff set
cpu_bind=MASK - r09c3b3, task  2  2 [61229]: mask 0x100 set
cpu_bind=MASK - r09c3b3, task  1  1 [61228]: mask 0x10 set
cpu_bind=MASK - r09c3b3, task  0  0 [61227]: mask 0x1 set
cpu_bind=MASK - r09c3b3, task  3  3 [61230]: mask 0x1000 set
cpu_bind=MASK - r09c3b3, task 4 4 [61231]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setup
cpu_bind=MASK - r09c3b3, task 5 5 [61232]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setup
cpu_bind=MASK - r09c3b3, task 6 6 [61233]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setup
cpu_bind=MASK - r09c3b3, task 7 7 [61234]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setup
cpu_bind=MASK - r09c3b3, task 9 9 [61236]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setup
cpu_bind=MASK - r09c3b3, task 8 8 [61235]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setup
cpu_bind=MASK - r09c3b3, task 11 11 [61238]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setup
cpu_bind=MASK - r09c3b3, task 13 13 [61240]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setup
cpu_bind=MASK - r09c3b3, task 10 10 [61237]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setup
cpu_bind=MASK - r09c3b3, task 15 15 [61242]: mask 0x1111111111111111 set FAILED cpu_bind=MASK - r09c3b3, task 14 14 [61241]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setup
slurmd[r09c3b3]: Failed task affinity setup
cpu_bind=MASK - r09c3b3, task 12 12 [61239]: mask 0x1111111111111111 set FAILED
slurmd[r09c3b3]: Failed task affinity setup
srun: error: r09c3b3: tasks 4-15: Exited with exit code 1


**********
SLURM.LOG:
**********
[2012-12-05T12:07:35+00:00] task_slurmd_batch_request: 430973
[2012-12-05T12:07:35+00:00] task/affinity: job 430973 CPU input mask for node: 0xFFFF [2012-12-05T12:07:35+00:00] task/affinity: job 430973 CPU final HW mask for node: 0xFFFF
[2012-12-05T12:07:35+00:00] Launching batch job 430973 for UID 50158
[2012-12-05T12:07:35+00:00] [430973] Using sched_affinity for tasks
[2012-12-05T12:07:35+00:00] task affinity : enforcing 'verbose,cores' cpu bind method [2012-12-05T12:07:35+00:00] lllp_distribution jobid [430973] binding: verbose,cores, dist 2
[2012-12-05T12:07:35+00:00] _task_layout_lllp_cyclic
[2012-12-05T12:07:35+00:00] _lllp_generate_cpu_bind jobid [430973]: verbose,mask_cpu, 0x0001,0x0010,0x0100,0x1000,0x0002,0x0004,0x0008,0x0020,0x0040,0x0080,0x0200,0x0400,0x0800,0x2000,0x4000,0x8000 [2012-12-05T12:07:35+00:00] launch task 430973.0 request from [email protected] (port 39175)
[2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks
[2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks
[2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks
[2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks
[2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup
[2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks
[2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks
[2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup
[2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks
[2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks
[2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup
[2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup
[2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks
[2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup
[2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks
[2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup
[2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks
[2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup
[2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks
[2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup
[2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks
[2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup
[2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks
[2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks
[2012-12-05T12:07:35+00:00] [430973.0] Using sched_affinity for tasks
[2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup
[2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup
[2012-12-05T12:07:35+00:00] [430973.0] Failed task affinity setup
[2012-12-05T12:07:36+00:00] [430973.0] done with job
[2012-12-05T12:07:36+00:00] [430973] sending REQUEST_COMPLETE_BATCH_SCRIPT, error:0
[2012-12-05T12:07:36+00:00] [430973] done with job




Attachment: smime.p7s
Description: Firma criptográfica S/MIME

Reply via email to