Hi,

Your cluster is very similar to ours where Torque and OpenMPI
is installed.

I would use this cmd line:

#PBS -l nodes=2:ppn=12
mpirun --report-bindings -np 16 <executable file name>

Here --map-by socket:pe=1 and -bind-to core is assumed as default setting.
Then, you can run 10 jobs independently and simultaneously beacause you
have 20 nodes totally.

While each node in your cluser has 12 cores, the nprocs per node
running on a node is 8, which means 66.666 % use, not 100%.
I think this loss can not be avoided as long as you use 16*N MPI per job.
It's a kind of mismatch with your cluster which has 12 cores per node.
If you can use 12*N MPI per job, then it's most effective.
Is there any reason why you use 16*N MPI per job?

Tetsuya

Reply via email to