ucx PML should work just fine even on a single node scenario. As Jeff indicated you need to move the MCA param `--mca pml ucx` before your command.
George. On Mon, Mar 6, 2023 at 9:48 AM Jeff Squyres (jsquyres) via users < users@lists.open-mpi.org> wrote: > If this run was on a single node, then UCX probably disabled itself since > it wouldn't be using InfiniBand or RoCE to communicate between peers. > > Also, I'm not sure your command line was correct: > > perf_benchmark $ mpirun -np 32 --map-by core --bind-to core ./perf --mca > pml ucx > > You probably need to list all of mpirun's CLI options *before* you list > the ./perf executable. In its right-to-left traversal, once mpirun hits a > CLI option it does not recognize (e.g., "./perf"), it assumes that it is > the user's executable name, and does not process the CLI options to the > right of that. > > Hence, the output you show must have forced the UCX PML another way -- > perhaps you set an environment variable or something? > > ------------------------------ > *From:* users <users-boun...@lists.open-mpi.org> on behalf of Chandran, > Arun via users <users@lists.open-mpi.org> > *Sent:* Monday, March 6, 2023 3:33 AM > *To:* Open MPI Users <users@lists.open-mpi.org> > *Cc:* Chandran, Arun <arun.chand...@amd.com> > *Subject:* Re: [OMPI users] What is the best choice of pml and btl for > intranode communication > > > [Public] > > > > Hi Gilles, > > > > Thanks very much for the information. > > > > I was looking for the best pml + btl combination for a standalone intra > node with high task count (>= 192) with no HPC-class networking installed. > > > > Just now realized that I can’t use pml ucx for such cases as it is unable > find IB and fails. > > > > perf_benchmark $ mpirun -np 32 --map-by core --bind-to core ./perf --mca > pml ucx > > -------------------------------------------------------------------------- > > > No components were able to be opened in the pml > framework. > > > > > This typically means that either no components of this type > were > > installed, or none of the installed components can be > loaded. > > Sometimes this means that shared libraries required by > these > > components are unable to be > found/loaded. > > > > > Host: > lib-ssp-04 > > Framework: > pml > > -------------------------------------------------------------------------- > > > [lib-ssp-04:753542] PML ucx cannot be > selected > > [lib-ssp-04:753531] PML ucx cannot be > selected > > [lib-ssp-04:753541] PML ucx cannot be > selected > > [lib-ssp-04:753539] PML ucx cannot be > selected > > [lib-ssp-04:753545] PML ucx cannot be > selected > > [lib-ssp-04:753547] PML ucx cannot be > selected > > [lib-ssp-04:753572] PML ucx cannot be > selected > > [lib-ssp-04:753538] PML ucx cannot be selected > > > [lib-ssp-04:753530] PML ucx cannot be > selected > > [lib-ssp-04:753537] PML ucx cannot be > selected > > [lib-ssp-04:753546] PML ucx cannot be selected > > > [lib-ssp-04:753544] PML ucx cannot be > selected > > [lib-ssp-04:753570] PML ucx cannot be > selected > > [lib-ssp-04:753567] PML ucx cannot be selected > > > [lib-ssp-04:753534] PML ucx cannot be > selected > > [lib-ssp-04:753592] PML ucx cannot be selected > > [lib-ssp-04:753529] PML ucx cannot be selected > > <snip> > > > > That means my only choice is pml/ob1 + btl/vader. > > > > --Arun > > > > *From:* users <users-boun...@lists.open-mpi.org> *On Behalf Of *Gilles > Gouaillardet via users > *Sent:* Monday, March 6, 2023 12:56 PM > *To:* Open MPI Users <users@lists.open-mpi.org> > *Cc:* Gilles Gouaillardet <gilles.gouaillar...@gmail.com> > *Subject:* Re: [OMPI users] What is the best choice of pml and btl for > intranode communication > > > > *Caution:* This message originated from an External Source. Use proper > caution when opening attachments, clicking links, or responding. > > > > Arun, > > > > First Open MPI selects a pml for **all** the MPI tasks (for example, > pml/ucx or pml/ob1) > > > > Then, if pml/ob1 ends up being selected, a btl component (e.g. btl/uct, > btl/vader) is used for each pair of MPI tasks > > (tasks on the same node will use btl/vader, tasks on different nodes will > use btl/uct) > > > > Note that if UCX is available, pml/ucx takes the highest priority, so no > btl is involved > > (in your case, if means intra-node communications will be handled by UCX > and not btl/vader). > > You can force ob1 and try different combinations of btl with > > mpirun --mca pml ob1 --mca btl self,<btl1>,<btl2> ... > > > > I expect pml/ucx is faster than pml/ob1 with btl/uct for inter node > communications. > > > > I have not benchmarked Open MPI for a while and it is possible btl/vader > outperforms pml/ucx for intra nodes communications, > > so if you run on a small number of Infiniband interconnected nodes with a > large number of tasks per node, you might be able > > to get the best performances by forcing pml/ob1. > > > > Bottom line, I think it is best for you to benchmark your application and > pick the combination that leads to the best performances, > > and you are more than welcome to share your conclusions. > > > > Cheers, > > > > Gilles > > > > > > On Mon, Mar 6, 2023 at 3:12 PM Chandran, Arun via users < > users@lists.open-mpi.org> wrote: > > [Public] > > Hi Folks, > > I can run benchmarks and find the pml+btl (ob1, ucx, uct, vader, etc) > combination that gives the best performance, > but I wanted to hear from the community about what is generally used in > "__high_core_count_intra_node_" cases before jumping into conclusions. > > As I am a newcomer to openMPI I don't want to end up using a combination > only because it fared better in a benchmark (overfitting?) > > Or the choice of pml+btl for the 'intranode' case is not so important as > openmpi is mainly used in 'internode' and the 'networking-equipment' > decides the pml+btl? (UCX for IB) > > --Arun > > -----Original Message----- > From: users <users-boun...@lists.open-mpi.org> On Behalf Of Chandran, > Arun via users > Sent: Thursday, March 2, 2023 4:01 PM > To: users@lists.open-mpi.org > Cc: Chandran, Arun <arun.chand...@amd.com> > Subject: [OMPI users] What is the best choice of pml and btl for intranode > communication > > Hi Folks, > > As the number of cores in a socket is keep on increasing, the right > pml,btl (ucx, ob1, uct, vader, etc) that gives the best performance in > "intra-node" scenario is important. > > For openmpi-4.1.4, which pml, btl combination is the best for intra-node > communication in the case of higher core count scenario? (p-to-p as well as > coll) and why? > Does the answer for the above question holds good for the upcoming ompi5 > release? > > --Arun > >