Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-06 Thread Florent GERMAIN via users
Hi, I passed through this code (https://github.com/open-mpi/ompi/blob/main/opal/mca/common/ucx/common_ucx.c#L216) last week and the logic can be summarized as : * ask available transports on a context * check if one of transports specified by opal_common_ucx_tls (or mca_pml_ucx_tls or

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-06 Thread Chandran, Arun via users
[Public] I was able to run with ucx, with the below command (Ref: https://www.mail-archive.com/users@lists.open-mpi.org/msg34585.html) $ mpirun -np 2 --map-by core --bind-to core --mca pml ucx --mca pml_base_verbose 10 --mca mtl_base_verbose 10 -x OMPI_MCA_pml_ucx_verbose=10 -x

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-06 Thread Jeff Squyres (jsquyres) via users
Per George's comments, I stand corrected: UCX does​ work fine in single-node cases -- he confirmed to me that he tested it on his laptop, and it worked for him. I think some of the mails in this thread got delivered out of order. Edgar's and George's comments about how/when the UCX PML is

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-06 Thread George Bosilca via users
Edgar is right, UCX_TLS has some role in the selection. You can see the current selection by running `uxc_info -c`. In my case, UCX_TLS is set to `all` somehow, and I had either a not-connected IB device or a GPU. However, I did not set UCX_TLS manually, and I can't see it anywhere in my system

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-06 Thread Edgar Gabriel via users
[AMD Official Use Only - General] UCX will disqualify itself unless it finds cuda, rocm, or InfiniBand network to use. To allow UCX to run on a regular shared memory job without GPUs or IB, you have to set UCX_TLS environment variable explicitly allowe UCX to run for shm, e.g : mpirun -x

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-06 Thread Jeff Squyres (jsquyres) via users
Per George's comments, I stand corrected: UCX does​ work fine in single-node cases -- he confirmed to me that he tested it on his laptop, and it worked for him. That being said, you're passing "--mca pml ucx" in the correct place now, and you're therefore telling Open MPI "_only_ use the UCX

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-06 Thread Chandran, Arun via users
[Public] Hi, Yes, it is run on a single node, there is no IB anr RoCE attached to it. Pasting the complete o/p (I might have mistakenly copy pasted the command in the previous mail) # perf_benchmark $ mpirun -np 2 --map-by core --bind-to core --mca pml ucx --mca pml_base_verbose 10

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-06 Thread George Bosilca via users
ucx PML should work just fine even on a single node scenario. As Jeff indicated you need to move the MCA param `--mca pml ucx` before your command. George. On Mon, Mar 6, 2023 at 9:48 AM Jeff Squyres (jsquyres) via users < users@lists.open-mpi.org> wrote: > If this run was on a single node,

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-06 Thread Jeff Squyres (jsquyres) via users
If this run was on a single node, then UCX probably disabled itself since it wouldn't be using InfiniBand or RoCE to communicate between peers. Also, I'm not sure your command line was correct: perf_benchmark $ mpirun -np 32 --map-by core --bind-to core ./perf --mca pml ucx You probably

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-06 Thread Chandran, Arun via users
[AMD Official Use Only - General] Hi Gilles, Yes, I am using xpmem, but getting the below issue. https://github.com/open-mpi/ompi/issues/11463 --Arun From: Gilles Gouaillardet Sent: Monday, March 6, 2023 2:08 PM To: Chandran, Arun Subject: Re: [OMPI users] What is the best choice of pml

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-06 Thread Chandran, Arun via users
[Public] Hi Gilles, Thanks very much for the information. I was looking for the best pml + btl combination for a standalone intra node with high task count (>= 192) with no HPC-class networking installed. Just now realized that I can’t use pml ucx for such cases as it is unable find IB and

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-05 Thread Gilles Gouaillardet via users
Arun, First Open MPI selects a pml for **all** the MPI tasks (for example, pml/ucx or pml/ob1) Then, if pml/ob1 ends up being selected, a btl component (e.g. btl/uct, btl/vader) is used for each pair of MPI tasks (tasks on the same node will use btl/vader, tasks on different nodes will use

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-05 Thread Chandran, Arun via users
[Public] Hi Folks, I can run benchmarks and find the pml+btl (ob1, ucx, uct, vader, etc) combination that gives the best performance, but I wanted to hear from the community about what is generally used in "__high_core_count_intra_node_" cases before jumping into conclusions. As I am a

[OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-02 Thread Chandran, Arun via users
[Public] Hi Folks, As the number of cores in a socket is keep on increasing, the right pml,btl (ucx, ob1, uct, vader, etc) that gives the best performance in "intra-node" scenario is important. For openmpi-4.1.4, which pml, btl combination is the best for intra-node communication in the case