Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
Aw, sheesh. Thanks. Somehow I missed that despite being on the page - lack of focus, I guess. Best, Charlie > On Jun 14, 2018, at 4:38 PM, Pavel Shamis wrote: > > You just have to switch PML to UCX. > You have some example of the command line here: >

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Pavel Shamis
You just have to switch PML to UCX. You have some example of the command line here: https://github.com/openucx/ucx/wiki/OpenMPI-and-OpenSHMEM-installation-with-UCX Best, P. On Thu, Jun 14, 2018 at 3:25 PM Charles A Taylor wrote: > Hmmm. ompi_info only shows the ucx pml. I don’t see any

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
Hmmm. ompi_info only shows the ucx pml. I don’t see any “transports”. Will they show up somewhere or are they documented. Right now it looks like the only UCX related thing I can do with openmpi 3.1.0 is export OMPI_MCA_pml=ucx mpiexec …. From ompi_info… $ ompi_info --param all all |

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Cabral, Matias A
Hey Jeff, I will help with the OFI part. Thanks, _MAC -Original Message- From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Jeff Squyres (jsquyres) via users Sent: Thursday, June 14, 2018 12:50 PM To: Open MPI User's List Cc: Jeff Squyres (jsquyres) Subject: Re:

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Jeff Squyres (jsquyres) via users
Yeah, keeping the documentation / FAQ up to date is... difficult. :-( We could definitely use some help with that. Does anyone have some cycles to help update our FAQ, perchance? > On Jun 14, 2018, at 11:08 AM, Charles A Taylor wrote: > > Thank you, Jeff. > > The ofi MTL with the verbs

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
Thank you, Jeff. The ofi MTL with the verbs provider seems to be working well at the moment. I’ll need to let it run a day or so before I know whether we can avoid the deadlocks experienced with the straight openib BTL. I’ve also built-in UCX support so I’ll be trying that next. Again,

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Jeff Squyres (jsquyres) via users
Charles -- It may have gotten lost in the middle of this thread, but the vendor-recommended way of running on InfiniBand these days is with UCX. I.e., install OpenUCX and use one of the UCX transports in Open MPI. Unless you have special requirements, you should likely give this a try and

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
Hi Matias, Thanks for the response. As of a couple of hours ago we are running: libfabric-devel-1.5.3-1.el7.x86_64 libfabric-1.5.3-1.el7.x86_64 As for the provider, I saw that one but just listed “verbs”. I’ll go with the “verbs;ofi_rxm” going forward. Regards, Charlie > On Jun

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Cabral, Matias A
Hi Charles, What version of libfabric do you have installed? To run OMPI using the verbs provider you need to pair it with the ofi_rxm provider. fi_info should list it like: … provider: verbs;ofi_rxm … So in your command line you have to specify: mpirun -mca pml cm -mca mtl ofi -mca

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
FYI… GIZMO: prov/verbs/src/ep_rdm/verbs_tagged_ep_rdm.c:443: fi_ibv_rdm_tagged_release_remote_sbuff: Assertion `0' failed. GIZMO:10405 terminated with signal 6 at PC=2add5835c1f7 SP=7fff8071b008. Backtrace: /usr/lib64/libc.so.6(gsignal+0x37)[0x2add5835c1f7]

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
I see what you mean. Below is the output (filtered for a single host). Our setup is very generic. Dell SOS6320 hosts (haswell) Mellanox connectx-3 HCAs (mlx4 drivers - native RHEL, not mofed). FDR/EDR switches (stand-alone opensm) RHEL7.4 slurm 16.05.11 pmix (pmix-1.1.5-1.el7.x86_64) openmpi

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Howard Pritchard
Hello Charles You are heading in the right direction. First you might want to run the libfabric fi_info command to see what capabilities you picked up from the libfabric RPMs. Next you may well not actually be using the OFI mtl. Could you run your app with export

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Gilles Gouaillardet
Charles, If you are using infiniband hardware, the recommended way is to use UCX. Cheers, Gilles On Thursday, June 14, 2018, Charles A Taylor wrote: > Because of the issues we are having with OpenMPI and the openib BTL > (questions previously asked), I’ve been looking into what other

[OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
Because of the issues we are having with OpenMPI and the openib BTL (questions previously asked), I’ve been looking into what other transports are available. I was particularly interested in OFI/libfabric support but cannot find any information on it more recent than a reference to the usNIC