Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-27 Thread Michael Di Domenico via users
for whatever it's worth running the test program on my OPA cluster seems to work. well it keeps spitting out [INFO MEMORY] lines, not sure if it's supposed to stop at some point i'm running rhel7, gcc 10.1, openmpi 4.0.5rc2, with-ofi, without-{psm,ucx,verbs} On Tue, Jan 26, 2021 at 3:44 PM Patri

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-26 Thread Patrick Begou via users
Hi Michael indeed I'm a little bit lost with all these parameters in OpenMPI, mainly because for years it works just fine out of the box in all my deployments on various architectures, interconnects and linux flavor. Some weeks ago I deploy OpenMPI4.0.5 in Centos8 with gcc10, slurm and UCX on an A

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-26 Thread Heinz, Michael William via users
Patrick how are you using original PSM if you’re using Omni-Path hardware? The original PSM was written for QLogic DDR and QDR Infiniband adapters. As far as needing openib - the issue is that the PSM2 MTL doesn’t support a subset of MPI operations that we previously used the pt2pt BTL for. For

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-26 Thread Patrick Begou via users
Hi all, I ran many tests today. I saw that an older 4.0.2 version of OpenMPI packaged with Nix was running using openib. So I add the --with-verbs option to setup this module. That I can see now is that: mpirun -hostfile $OAR_NODEFILE *--mca mtl psm -mca btl_openib_allow_ib true* - the

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-26 Thread Patrick Begou via users
What happens if you specify -mtl ofi ? >> >> -Original Message- >> From: users On Behalf Of Patrick Begou >> via users >> Sent: Monday, January 25, 2021 12:54 PM >> To: users@lists.open-mpi.org >> Cc: Patrick Begou >> Subject: Re: [OMPI users

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-25 Thread Heinz, Michael William via users
Patrick, is your application multi-threaded? PSM2 was not originally designed for multiple threads per process. I do know that the OSU alltoallV test does pass when I try it. Sent from my iPad > On Jan 25, 2021, at 12:57 PM, Patrick Begou via users > wrote: > > Hi Howard and Michael, > > t

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-25 Thread Ralph Castain via users
; users > Sent: Monday, January 25, 2021 12:54 PM > To: users@lists.open-mpi.org > Cc: Patrick Begou > Subject: Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path > > Hi Howard and Michael, > > thanks for your feedback. I did not want to write a toot long mail with non &g

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-25 Thread Heinz, Michael William via users
What happens if you specify -mtl ofi ? -Original Message- From: users On Behalf Of Patrick Begou via users Sent: Monday, January 25, 2021 12:54 PM To: users@lists.open-mpi.org Cc: Patrick Begou Subject: Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path Hi Howard and Michael, thanks

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-25 Thread Patrick Begou via users
Hi Howard and Michael, thanks for your feedback. I did not want to write a toot long mail with non pertinent information so I just show how the two different builds give different result. I'm using a small test case based on my large code, the same used to show the memory leak with mpi_Alltoallv c

[OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-25 Thread Heinz, Michael William via users
Patrick, You really have to provide us some detailed information if you want assistance. At a minimum we need to know if you're using the PSM2 MTL or the OFI MTL and what the actual error is. Please provide the actual command line you are having problems with, along with any errors. In additio

[OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-25 Thread Patrick Begou via users
Hi, I'm trying to deploy OpenMPI 4.0.5 on the university's supercomputer: * Debian GNU/Linux 9 (stretch) * Intel Corporation Omni-Path HFI Silicon 100 Series [discrete] (rev 11) and for several days I have a bug (wrong results using MPI_AllToAllW) on this server when using OmniPath. Running