Hi,
I passed through this code
(https://github.com/open-mpi/ompi/blob/main/opal/mca/common/ucx/common_ucx.c#L216)
last week and the logic can be summarized as :
* ask available transports on a context
* check if one of transports specified by opal_common_ucx_tls (or
mca_pml_ucx_tls or
[Public]
I was able to run with ucx, with the below command (Ref:
https://www.mail-archive.com/users@lists.open-mpi.org/msg34585.html)
$ mpirun -np 2 --map-by core --bind-to core --mca pml ucx --mca
pml_base_verbose 10 --mca mtl_base_verbose 10 -x OMPI_MCA_pml_ucx_verbose=10 -x
Per George's comments, I stand corrected: UCX does work fine in single-node
cases -- he confirmed to me that he tested it on his laptop, and it worked for
him.
I think some of the mails in this thread got delivered out of order. Edgar's
and George's comments about how/when the UCX PML is
Edgar is right, UCX_TLS has some role in the selection. You can see the
current selection by running `uxc_info -c`. In my case, UCX_TLS is set to
`all` somehow, and I had either a not-connected IB device or a GPU.
However, I did not set UCX_TLS manually, and I can't see it anywhere in my
system
[AMD Official Use Only - General]
UCX will disqualify itself unless it finds cuda, rocm, or InfiniBand network to
use. To allow UCX to run on a regular shared memory job without GPUs or IB, you
have to set UCX_TLS environment variable explicitly allowe UCX to run for shm,
e.g :
mpirun -x
Per George's comments, I stand corrected: UCX does work fine in single-node
cases -- he confirmed to me that he tested it on his laptop, and it worked for
him.
That being said, you're passing "--mca pml ucx" in the correct place now, and
you're therefore telling Open MPI "_only_ use the UCX
[Public]
Hi,
Yes, it is run on a single node, there is no IB anr RoCE attached to it.
Pasting the complete o/p (I might have mistakenly copy pasted the command in
the previous mail)
#
perf_benchmark $ mpirun -np 2 --map-by core --bind-to core --mca pml ucx --mca
pml_base_verbose 10
ucx PML should work just fine even on a single node scenario. As Jeff
indicated you need to move the MCA param `--mca pml ucx` before your
command.
George.
On Mon, Mar 6, 2023 at 9:48 AM Jeff Squyres (jsquyres) via users <
users@lists.open-mpi.org> wrote:
> If this run was on a single node,
If this run was on a single node, then UCX probably disabled itself since it
wouldn't be using InfiniBand or RoCE to communicate between peers.
Also, I'm not sure your command line was correct:
perf_benchmark $ mpirun -np 32 --map-by core --bind-to core ./perf --mca pml
ucx
You probably
[AMD Official Use Only - General]
Hi Gilles,
Yes, I am using xpmem, but getting the below issue.
https://github.com/open-mpi/ompi/issues/11463
--Arun
From: Gilles Gouaillardet
Sent: Monday, March 6, 2023 2:08 PM
To: Chandran, Arun
Subject: Re: [OMPI users] What is the best choice of pml
[Public]
Hi Gilles,
Thanks very much for the information.
I was looking for the best pml + btl combination for a standalone intra node
with high task count (>= 192) with no HPC-class networking installed.
Just now realized that I can’t use pml ucx for such cases as it is unable find
IB and
Arun,
First Open MPI selects a pml for **all** the MPI tasks (for example,
pml/ucx or pml/ob1)
Then, if pml/ob1 ends up being selected, a btl component (e.g. btl/uct,
btl/vader) is used for each pair of MPI tasks
(tasks on the same node will use btl/vader, tasks on different nodes will
use
[Public]
Hi Folks,
I can run benchmarks and find the pml+btl (ob1, ucx, uct, vader, etc)
combination that gives the best performance,
but I wanted to hear from the community about what is generally used in
"__high_core_count_intra_node_" cases before jumping into conclusions.
As I am a
[Public]
Hi Folks,
As the number of cores in a socket is keep on increasing, the right pml,btl
(ucx, ob1, uct, vader, etc) that gives the best performance in "intra-node"
scenario is important.
For openmpi-4.1.4, which pml, btl combination is the best for intra-node
communication in the case
14 matches
Mail list logo