When you specify slots=16, you are no longer oversubscribed - and so we don't 
back down the MPI aggressiveness on messaging. When you are oversubscribed, we 
have each MPI proc release its schedule slice back to the OS when it's waiting 
for a message.

Overloaded and aggressive = bad performance.

On Nov 13, 2013, at 8:32 AM, Iván Cores González <ivan.cor...@udc.es> wrote:

> Hi,
> I am running the NAS parallel benchmarks and I have a performance problem
> depending on the hostfile configuration. I use Open MPI version 1.7.2.
> 
> I run the FT benchmark in 16 processes, but I want to overload each core
> with 4 processes (yes, I want to do it), so I execute:
> 
> time mpirun --hostfile ./hostfile -np 16 --oversubscribe -bind-to 
> core:overload-allowed --ppr 4:core --report-bindings ./ft.C.16
> 
> and the hostfile is (each node has 2 octo-core Intel Xeon processors):
> compute-0-15 slots=4
> 
> I check the core mapping whit the "top" command and the 16 processes run 
> over 4 physical cores. The time execution in this configuration is 80 seconds.
> 
> The problem is that if I change the hostfile to:
> compute-0-15 slots=16
> 
> and I run the same mpirun instruction (overloading each core with 4 
> processes) the execution time increase to 240 seconds (!). 
> I check the core mapping again and the 16 processes were running over 
> the same 4 cores. 
> 
> Any idea to explain the performance drop?
> 
> Thanks,
> Iván Cores.
> 
> P.S.:
> In both cases the binging is:
> [compute-0-15.local:14691] MCW rank 15 bound to socket 0[core 3[hwt 0-1]]: 
> [../../../BB/../../../..][../../../../../../../..]
> [compute-0-15.local:14691] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: 
> [BB/../../../../../../..][../../../../../../../..]
> [compute-0-15.local:14691] MCW rank 1 bound to socket 0[core 0[hwt 0-1]]: 
> [BB/../../../../../../..][../../../../../../../..]
> [compute-0-15.local:14691] MCW rank 2 bound to socket 0[core 0[hwt 0-1]]: 
> [BB/../../../../../../..][../../../../../../../..]
> [compute-0-15.local:14691] MCW rank 3 bound to socket 0[core 0[hwt 0-1]]: 
> [BB/../../../../../../..][../../../../../../../..]
> [compute-0-15.local:14691] MCW rank 4 bound to socket 0[core 1[hwt 0-1]]: 
> [../BB/../../../../../..][../../../../../../../..]
> [compute-0-15.local:14691] MCW rank 5 bound to socket 0[core 1[hwt 0-1]]: 
> [../BB/../../../../../..][../../../../../../../..]
> [compute-0-15.local:14691] MCW rank 6 bound to socket 0[core 1[hwt 0-1]]: 
> [../BB/../../../../../..][../../../../../../../..]
> [compute-0-15.local:14691] MCW rank 7 bound to socket 0[core 1[hwt 0-1]]: 
> [../BB/../../../../../..][../../../../../../../..]
> [compute-0-15.local:14691] MCW rank 8 bound to socket 0[core 2[hwt 0-1]]: 
> [../../BB/../../../../..][../../../../../../../..]
> [compute-0-15.local:14691] MCW rank 9 bound to socket 0[core 2[hwt 0-1]]: 
> [../../BB/../../../../..][../../../../../../../..]
> [compute-0-15.local:14691] MCW rank 10 bound to socket 0[core 2[hwt 0-1]]: 
> [../../BB/../../../../..][../../../../../../../..]
> [compute-0-15.local:14691] MCW rank 11 bound to socket 0[core 2[hwt 0-1]]: 
> [../../BB/../../../../..][../../../../../../../..]
> [compute-0-15.local:14691] MCW rank 12 bound to socket 0[core 3[hwt 0-1]]: 
> [../../../BB/../../../..][../../../../../../../..]
> [compute-0-15.local:14691] MCW rank 13 bound to socket 0[core 3[hwt 0-1]]: 
> [../../../BB/../../../..][../../../../../../../..]
> [compute-0-15.local:14691] MCW rank 14 bound to socket 0[core 3[hwt 0-1]]: 
> [../../../BB/../../../..][../../../../../../../..]
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to