Re: [OMPI users] Have Trouble Setting Up a Ring Network Using Open MPI

2015-09-18 Thread Jeff Squyres (jsquyres)
On Sep 18, 2015, at 7:26 PM, Gilles Gouaillardet wrote: > > I built a similar environment with master and private ip and that does not > work. > my understanding is each tasks has two tcp btl (one per interface), > and there is currently no mechanism to tell that a node is unreachable > via a g

Re: [OMPI users] Have Trouble Setting Up a Ring Network Using Open MPI

2015-09-18 Thread Gilles Gouaillardet
Jeff, I built a similar environment with master and private ip and that does not work. my understanding is each tasks has two tcp btl (one per interface), and there is currently no mechanism to tell that a node is unreachable via a given btl. (a btl picks the "best" interface for each node, but it

Re: [OMPI users] Have Trouble Setting Up a Ring Network Using Open MPI

2015-09-18 Thread Shang Li
Hi Jeff, Thanks for your suggestion. (And also thanks to Gilles!) I'll play around with your suggestions and let you know if I make any progresses. About the version of my Open MPI, it's an Texas Instruments' implementation. So the version number 1.0.0.22 is their own version.. I looked at their

Re: [OMPI users] Have Trouble Setting Up a Ring Network Using Open MPI

2015-09-18 Thread Jeff Squyres (jsquyres)
Whoa; wait -- are you really using Open MPI v1.0? That's over 10 years old... Can you update to Open MPI v1.10? > On Sep 18, 2015, at 1:37 PM, Jeff Squyres (jsquyres) > wrote: > > Open MPI uses different heuristics depending on whether IP addresses are > public or private. > > All your IP

Re: [OMPI users] Have Trouble Setting Up a Ring Network Using Open MPI

2015-09-18 Thread Jeff Squyres (jsquyres)
Open MPI uses different heuristics depending on whether IP addresses are public or private. All your IP addresses are technically "public" -- they're not in 10.x.x.x or 192.168.x.x, for example. So Open MPI assumes that they are all routable to each other. You might want to change your 3 netwo

Re: [OMPI users] C/R Enabled Debugging

2015-09-18 Thread Dave Love
"gzzh...@buaa.edu.cn" writes: > Hi Team > I am trying to use the MPI to do some test and study on the C/R > enabled debugging. Professor Josh Hursey said that the feature never > made it into a release so it was only ever available on the trunk, > However , since that time the C/R functionality

Re: [OMPI users] OpenMPI-1.10.0 bind-to core error

2015-09-18 Thread Gilles Gouaillardet
Thanks Patrick, could you please try again with the --hetero-nodes mpirun option ? (I am afk, and not 100% sure about the syntax) could you also submit a job with 2 nodes and 4 cores on each node, that does cat /proc/self/status oarshmost cat /proc/self/status btw, is there any reason why do yo

Re: [OMPI users] OpenMPI-1.10.0 bind-to core error

2015-09-18 Thread Patrick Begou
Gilles Gouaillardet wrote: Patrick, by the way, this will work when running on a single node. i do not know what will happen when you run on multiple node ... since there is no OAR integration in openmpi, i guess you are using ssh to start orted on the remote nodes (unless you instructed ompi

[OMPI users] possible GATS bug in osc/sm

2015-09-18 Thread Steffen Christgau
Hi folks, [the following discussion is based on v1.8.8] suppose you have a MPI one-sided program using general active target synchronization (GATS). In that program, a single origin process performs two rounds of communication, i.e. two access epochs, to different target process groups. The targe

Re: [OMPI users] OpenMPI-1.10.0 bind-to core error

2015-09-18 Thread Gilles Gouaillardet
Patrick, by the way, this will work when running on a single node. i do not know what will happen when you run on multiple node ... since there is no OAR integration in openmpi, i guess you are using ssh to start orted on the remote nodes (unless you instructed ompi to use an OARified version

Re: [OMPI users] OpenMPI-1.10.0 bind-to core error

2015-09-18 Thread Gilles Gouaillardet
Patrick, i just filled PR 586 https://github.com/open-mpi/ompi-release/pull/586 for the v1.10 series this is only a three line patch. could you please give it a try ? Cheers, Gilles On 9/18/2015 4:54 PM, Patrick Begou wrote: Ralph Castain wrote: As I said, if you don’t provide an explicit

Re: [OMPI users] OpenMPI-1.10.0 bind-to core error

2015-09-18 Thread Patrick Begou
Ralph Castain wrote: As I said, if you don't provide an explicit slot count in your hostfile, we default to allowing oversubscription. We don't have OAR integration in OMPI, and so mpirun isn't recognizing that you are running under a resource manager - it thinks this is just being controlled b