[OMPI devel] modex receive

2016-04-28 Thread dpchoudh .
Hello all I am struggling with this issue for last few days and thought it would be prudent to ask for help from people who have way more experience than I do. There are two questions, interrelated in my mind, but may not be so in reality. Question 2 is the issue I am struggling with, and questio

Re: [OMPI devel] modex receive

2016-04-28 Thread Gilles Gouaillardet
the add_procs subroutine of the btl should be called. /* i added a printf in mca_btl_tcp_add_procs and it *is* invoked */ can you try again with --mca pml ob1 --mca pml_base_verbose 100 ? maybe the add_procs subroutine is not invoked because openmpi uses cm instead of ob1 Cheers, Gilles

Re: [OMPI devel] modex receive

2016-04-28 Thread George Bosilca
In Open MPI a process only retrieve information about a peer if they communicate. Thus, the add_proc is called from the two sides of a connection establishment, when locally a connection is decided or when a network packet requires a the existence of a proc (for the initiator of the connection). Th

[OMPI devel] Why is floating point number used for locality

2016-04-28 Thread dpchoudh .
Hello all I am wondering about the rationale of using floating point numbers for calculating 'distances' in the openib BTL. Is it because some distances can be infinite and there is no (conventional) way to represent infinity using integers? Thanks for your comments Durga The surgeon general a

Re: [OMPI devel] Why is floating point number used for locality

2016-04-28 Thread Brice Goglin
It comes from the hwloc API. It doesn't use integers because some users want to provide their own distance matrix that was generated by benchmarks. Also we normalize the matrix to have latency 1 on the diagonal (for local memory access latency ) and that causes non-diagonal items not to be integers

[OMPI devel] 2.0.0 is coming: what do we need to communicate to users?

2016-04-28 Thread Jeff Squyres (jsquyres)
We're getting darn close to v2.0.0. What "gotchas" do we need to communicate to users? I.e., what will people upgrading from v1.8.x/v1.10.x be surprised by? The most obvious one I can think of is mpirun requiring -np when slots are not specified somehow. What else do we need to communicate?

Re: [OMPI devel] psm mtl and no link

2016-04-28 Thread Henry Estela
Gilles, I have Truescale/qib hardware, I will try to repdoruce the error and make some somments. Thanks, Henry

[OMPI devel] Open MPI v2.0.0rc2

2016-04-28 Thread Jeff Squyres (jsquyres)
At long last, here's the next v2.0.0 release candidate: 2.0.0rc2: https://www.open-mpi.org/software/ompi/v2.x/ We didn't keep a good list of all the things that have changed since rc1 -- but it's many things. Here's a link to the NEWS file for v2.0.0: https://github.com/open-mpi/ompi-r

Re: [OMPI devel] modex receive

2016-04-28 Thread dpchoudh .
Hello Gilles You are absolutely right: 1. Adding --mca pml_base_verbose 100 does show that it is the cm PML that is being picked by default (even for TCP) 2. Adding --mca pml ob1 does cause add_procs() and related BTL friends to be invoked. With a command line of mpirun -np 2 -hostfile ~/hostf

Re: [OMPI devel] modex receive

2016-04-28 Thread Ralph Castain
CM is not being selected for TCP - you specified TCP for the BTLs, but that assumes that a BTL will be selected. You obviously have something in your system that is supported by an MTL, and that will always be selected before a BTL. > On Apr 28, 2016, at 8:22 PM, dpchoudh . wrote: > > Hello

Re: [OMPI devel] modex receive

2016-04-28 Thread Gilles Gouaillardet
my basic understanding is that ob1 works with btl, and cm works with mtl (please someone corrects me if I am wrong) an other way to put this is cm cannot use the tcp btl. so I can only guess one mtl (PSM ?) is available, and so cm is preferred over ob1. what if you mpirun --mca mtl ^psm ... is cm