Re: [OMPI users] Help Getting Started with Open MPI and PMIx and UCX

2019-01-22 Thread Cabral, Matias A
Hi Matt, There seem to be two different issues here: a) The warning message comes from the openib btl. Given that Omnipath has verbs API and you have the necessary libraries in your system, openib btl finds itself as a potential transport and prints the warning during its init (openib

Re: [OMPI users] Help Getting Started with Open MPI and PMIx and UCX

2019-01-18 Thread Cabral, Matias A
Hi Matt, Few comments/questions: - If your cluster has Omni-Path, you won’t need UCX. Instead you can run using PSM2, or alternatively OFI (a.k.a. Libfabric) - With the command you shared below (4 ranks on the local node) (I think) a shared mem transport is being selected

Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

2019-01-16 Thread Cabral, Matias A
mtl ofi -mca pml cm -mca mtl_ofi_provider_include psm2 ./a Hello World from proccess 0 out of 2 This is process 0 reporting:: Hello World from proccess 1 out of 2 Process 1 received number 10 from process 0 From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Cabral, Matias A Sent

Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

2019-01-15 Thread Cabral, Matias A
a mtl_ofi_provider_include psm2 ./a Hello World from proccess 0 out of 2 This is process 0 reporting:: Hello World from proccess 1 out of 2 Process 1 received number 10 from process 0 From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Cabral, Matias A Sent: Friday, January 11, 20

Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

2019-01-11 Thread Cabral, Matias A
rom: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Cabral, Matias A Sent: Friday, January 11, 2019 3:22 PM To: Open MPI Users Subject: Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send Hi Eduardo, The OFI MTL got some new features during 2018 that went into v4.

Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

2019-01-11 Thread Cabral, Matias A
Hi Eduardo, The OFI MTL got some new features during 2018 that went into v4.0.0 but are not backported to older OMPI versions. What version of libfabric are you using and where are you installing it from? I will try to reproduce your error. I'm running some quick tests and I see it working:

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Cabral, Matias A
Hey Jeff, I will help with the OFI part. Thanks, _MAC -Original Message- From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Jeff Squyres (jsquyres) via users Sent: Thursday, June 14, 2018 12:50 PM To: Open MPI User's List Cc: Jeff Squyres (jsquyres) Subject: Re:

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Cabral, Matias A
Hi Charles, What version of libfabric do you have installed? To run OMPI using the verbs provider you need to pair it with the ofi_rxm provider. fi_info should list it like: … provider: verbs;ofi_rxm … So in your command line you have to specify: mpirun -mca pml cm -mca mtl ofi -mca

Re: [OMPI users] OpenMPI with PSM on True Scale with OmniPath drivers

2018-01-22 Thread Cabral, Matias A
Hi William, Couple other questions: - Please share how you ompi configure line looks like. - Please clarify which is/are the compat libraries you refer to. There are some that are actually for the opposite case: Making TS app/libs run on Omnipath. - As Gilles mentions, moving to a newer

Re: [OMPI users] Received eager message(s) from an unknown process error on KNL

2017-04-28 Thread Cabral, Matias A
Hi Esthela, As George mentions, this is indeed libpsm2 printing this error. Opcode=0xCC is a disconnect retry. There are a few scenarios that could be happening, but can simplify in saying it is an already disconnected endpoint message arriving late. What version of Intel Ompin-path Software

Re: [OMPI users] openmpi single node jobs using btl openib

2017-02-08 Thread Cabral, Matias A
Hi Jingchao, The log shows the psm mtl is being selected. … [c1725.crane.hcc.unl.edu:187002] select: init returned priority 20 [c1725.crane.hcc.unl.edu:187002] selected cm best priority 30 [c1725.crane.hcc.unl.edu:187002] select: component ob1 not selected / finalized

Re: [OMPI users] Severe performance issue with PSM2 and single-node CP2K jobs

2017-02-08 Thread Cabral, Matias A
Hi Hristo, As you mention I have seen that the sm btl shows better performance for smaller messages than PMS2 shm device does, by running some osu benchmarks (especially BW for msg<256B). I even suspect that the difference would be more notable if you test the vader btl. However, the piece

Re: [OMPI users] Open MPI Java Error

2017-02-08 Thread Cabral, Matias A
Hi Thyago, psm is the user library to run with Intel TruScale cards. psm2 is for Intel OmniPath. There is a current problem in the libraries with OMPI java bindings: https://www.open-mpi.org/faq/?category=java#java_limitations thanks, _MAC From: users [mailto:users-boun...@lists.open-mpi.org]

Re: [OMPI users] Error using hpcc benchmark

2017-02-01 Thread Cabral, Matias A
Hi Wodel, As you already figured out, mpirun -x

Re: [OMPI users] Error using hpcc benchmark

2017-01-31 Thread Cabral, Matias A
Hi Wodel, As Howard mentioned, this is probably because many ranks and sending to a single one and exhausting the receive requests MQ. You can individually enlarge the receive/send requests queues with the specific variables (PSM_MQ_RECVREQS_MAX/ PSM_MQ_SENDREQS_MAX) or increase both with

Re: [OMPI users] device failed to appear .. Connection timed out

2016-12-08 Thread Cabral, Matias A
>Anyway, /dev/hfi1_0 doesn't exist. Make sure you have the hfi1 module/driver loaded. In addition, please confirm the links are in active state on all the nodes `opainfo` _MAC From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Howard Pritchard Sent: Thursday, December 08, 2016

Re: [OMPI users] Openmpi 2.0.1 build --with-psm2 failed on CentOS 7.2

2016-10-12 Thread Cabral, Matias A
ct: Re: [OMPI users] Openmpi 2.0.1 build --with-psm2 failed on CentOS 7.2 Thank you very much, MAC! Limin On Tue, Oct 11, 2016 at 10:15 PM, Cabral, Matias A <matias.a.cab...@intel.com<mailto:matias.a.cab...@intel.com>> wrote: Building psm2 should not be complicated (in case you c

Re: [OMPI users] Openmpi 2.0.1 build --with-psm2 failed on CentOS 7.2

2016-10-11 Thread Cabral, Matias A
[root@uranus ~]# nm /usr/lib64/libpsm2.so.2 nm: /usr/lib64/libpsm2.so.2: no symbols [root@uranus ~]# Thanks! Limin On Tue, Oct 11, 2016 at 7:00 PM, Cabral, Matias A <matias.a.cab...@intel.com<mailto:matias.a.cab...@intel.com>> wrote: Hi Limin, psm2_mq_irecv2 should be in libpsm

Re: [OMPI users] Openmpi 2.0.1 build --with-psm2 failed on CentOS 7.2

2016-10-11 Thread Cabral, Matias A
Hi Limin, psm2_mq_irecv2 should be in libpsm2.so. I’m not quite sure how CentOS packs it so I would like a little more info about the version being used. Some things to share: >rpm -qi libpsm2-0.7-4.el7.x86_64 > objdump –p /usr/lib64/libpsm2.so |grep SONAME >nm /usr/lib64/libpsm2.so |grep

Re: [OMPI users] MPI_Comm_spawn

2016-09-29 Thread Cabral, Matias A
Hi Giles et.al., You are right, ptl.c is in PSM2 code. As Ralph mentions, dynamic process support was/is not working in OMPI when using PSM2 because of an issue related to the transport keys. This was fixed in PR #1602 (https://github.com/open-mpi/ompi/pull/1602) and should be included in

Re: [OMPI users] runtime performance tuning for Intel OMA interconnect

2016-08-11 Thread Cabral, Matias A
is do you know other factors that cause a delay to a MPI_Send() when the receiver is not ready to receive? On Wed, Aug 10, 2016 at 11:48 PM, Cabral, Matias A <matias.a.cab...@intel.com<mailto:matias.a.cab...@intel.com>> wrote: To remain in eager mode you need to increase the size of PSM2_M

Re: [OMPI users] runtime performance tuning for Intel OMA interconnect

2016-08-10 Thread Cabral, Matias A
ould you please elaborate on "Just in case PSM2_MQ_EAGER_SDMA_SZ changes PIO to SDMA, always in eager mode." Thanks! Michael On Wed, Aug 10, 2016 at 3:59 PM, Cabral, Matias A <matias.a.cab...@intel.com<mailto:matias.a.cab...@intel.com>> wrote: Hi Michael, When Open MPI run

Re: [OMPI users] runtime performance tuning for Intel OMA interconnect

2016-08-10 Thread Cabral, Matias A
Hi Michael, When Open MPI run on Omni-Path it will choose the PSM2 MTL by default, to use the libpsm2.so. Strictly speaking, it has compatibility to run using the openib BTL. However, the performance so significantly impacted that it is, not only discouraged, but no tuning would make sense.

Re: [OMPI users] PSM vs PSM2

2016-06-02 Thread Cabral, Matias A
Hi Durga, Here is a short summary: PSM: is intended for Intel TrueScale InfiniBand product series. It is also known as PSM gen 1, uses libpsm_infinipath.so PSM2: is intended for Intel’s next generation fabric called OmniPath. PSM gen2, uses libpsm2.so. I didn’t know about the owner.txt

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Cabral, Matias A
016 5:52 AM To: Open MPI Users <us...@open-mpi.org> Subject: Re: [OMPI users] locked memory and queue pairs On Wed, Mar 16, 2016 at 4:49 PM, Cabral, Matias A <matias.a.cab...@intel.com> wrote: > I didn't go into the code to see who is actually calling this error message, > but

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Cabral, Matias A
rom: users [mailto:users-boun...@open-mpi.org] On Behalf Of Michael Di Domenico Sent: Wednesday, March 16, 2016 1:25 PM To: Open MPI Users <us...@open-mpi.org> Subject: Re: [OMPI users] locked memory and queue pairs On Wed, Mar 16, 2016 at 3:37 PM, Cabral, Matias A <matias.a.cab...@intel.com>

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Cabral, Matias A
Hi Michael, I may be missing some context, if you are using the qlogic cards you will always want to use the psm mtl (-mca pml cm -mca mtl psm) and not openib btl. As Tom suggest, confirm the limits are setup on every node: could it be the alltoall is reaching a node that "others" are not?