my basic understanding is that ob1 works with btl, and cm works with mtl
(please someone corrects me if I am wrong)
an other way to put this is cm cannot use the tcp btl.
so I can only guess one mtl (PSM ?) is available, and so cm is preferred
over ob1.
what if you
mpirun --mca mtl ^psm ...
is cm
CM is not being selected for TCP - you specified TCP for the BTLs, but that
assumes that a BTL will be selected. You obviously have something in your
system that is supported by an MTL, and that will always be selected before a
BTL.
> On Apr 28, 2016, at 8:22 PM, dpchoudh . wrote:
>
> Hello
Hello Gilles
You are absolutely right:
1. Adding --mca pml_base_verbose 100 does show that it is the cm PML that
is being picked by default (even for TCP)
2. Adding --mca pml ob1 does cause add_procs() and related BTL friends to
be invoked.
With a command line of
mpirun -np 2 -hostfile ~/hostf
At long last, here's the next v2.0.0 release candidate: 2.0.0rc2:
https://www.open-mpi.org/software/ompi/v2.x/
We didn't keep a good list of all the things that have changed since rc1 -- but
it's many things. Here's a link to the NEWS file for v2.0.0:
https://github.com/open-mpi/ompi-r
Gilles,
I have Truescale/qib hardware, I will try to repdoruce the error and make
some somments.
Thanks,
Henry
We're getting darn close to v2.0.0.
What "gotchas" do we need to communicate to users? I.e., what will people
upgrading from v1.8.x/v1.10.x be surprised by?
The most obvious one I can think of is mpirun requiring -np when slots are not
specified somehow.
What else do we need to communicate?
It comes from the hwloc API. It doesn't use integers because some users
want to provide their own distance matrix that was generated by
benchmarks. Also we normalize the matrix to have latency 1 on the
diagonal (for local memory access latency ) and that causes non-diagonal
items not to be integers
Hello all
I am wondering about the rationale of using floating point numbers for
calculating 'distances' in the openib BTL. Is it because some distances can
be infinite and there is no (conventional) way to represent infinity using
integers?
Thanks for your comments
Durga
The surgeon general a
In Open MPI a process only retrieve information about a peer if they
communicate. Thus, the add_proc is called from the two sides of a
connection establishment, when locally a connection is decided or when a
network packet requires a the existence of a proc (for the initiator of the
connection). Th
the add_procs subroutine of the btl should be called.
/* i added a printf in mca_btl_tcp_add_procs and it *is* invoked */
can you try again with --mca pml ob1 --mca pml_base_verbose 100 ?
maybe the add_procs subroutine is not invoked because openmpi uses cm
instead of ob1
Cheers,
Gilles
Hello all
I am struggling with this issue for last few days and thought it would be
prudent to ask for help from people who have way more experience than I do.
There are two questions, interrelated in my mind, but may not be so in
reality. Question 2 is the issue I am struggling with, and questio
11 matches
Mail list logo