Re: [OMPI users] Bad parallel scaling using Code Saturne with openmpi

Dugenoux Albert Tue, 10 Jul 2012 11:48:28 -0400

Thanks for your answer.You are right.
 I've tried upon 4 nodes with 6 processes and things are worst.
 
So do you suggest that unique thing to do is to order an infiniband switch or 
is there a possibility to enhance
something by tuning mca parameters ?



________________________________
De : Ralph Castain <r...@open-mpi.org>
À : Dugenoux Albert <dugeno...@yahoo.fr>; Open MPI Users <us...@open-mpi.org> 
Envoyé le : Mardi 10 juillet 2012 16h47
Objet : Re: [OMPI users] Bad parallel scaling using Code Saturne with openmpi


I suspect it mostly reflects communication patterns. I don't know anything 
about Saturne, but shared memory is a great deal faster than TCP, so the more 
processes sharing a node the better. You may also be hitting some natural 
boundary in your model - perhaps with 8 processes/node you wind up with more 
processes that cross the node boundary, further increasing the communication 
requirement. 

Do things continue to get worse if you use all 4 nodes with 6 processes/node?



On Jul 10, 2012, at 7:31 AM, Dugenoux Albert wrote:

Hi.
>
>I have recently built a cluster upon a Dell PowerEdge Server with a Debian 6.0 
>OS. This server is composed of 
>4 system board of 2 processors of hexacores. So it gives 12 cores per system 
>board.
>The boards are linked with a local Gbits switch. 
>
>In order to parallelize the software Code Saturne, which is a CFD solver, I 
>have configured the cluster
>such that there are a pbs server/mom on 1 system board and 3 mom and the 3 
>others cards. So this leads to 
>48 cores dispatched on 4 nodes of 12 CPU. Code saturne is compiled with the 
>openmpi 1.6 version.
>
>When I launch a simulation using 2 nodes with 12 cores, elapse time is good 
>and network traffic is not full.
>But when I launch the same simulation using 3 nodes with 8 cores, elapse time 
>is 5 times the previous one.
>I both cases, I use 24 cores and network seems not to be satured. 
>
>I have tested several configurations : binaries in local file system or on a 
>NFS. But results are the same.
>I have visited severals forums (in particular 
>http://www.open-mpi.org/community/lists/users/2009/08/10394.php)
>and read lots of threads, but as I am not an expert at clusters, I presently 
>do not see where it is wrong !
>
>Is it a problem in the configuration of PBS (I have installed it from the deb 
>packages), a subtile compilation options
>of openMPI, or a bad network configuration ?
>
>Regards.
>
>B. S._______________________________________________
>users mailing list
>us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Bad parallel scaling using Code Saturne with openmpi

Reply via email to