Re: [OMPI users] Performances problems with OpenMPI 5.0.5 and UCX 1.17.0 with Qlogiq infiniband

2024-10-02 Thread Patrick Begou via users
e network is welcome. Patrick Le 30/09/2024 à 18:41, Patrick Begou via users a écrit : Hi Nathan thanks for this suggestion. I have understood that now all is managed by the UCX layer. Am I wrong ? These options do not seams to work with my openMPI 5.0.5 build. But I've built OpenMPI on

Re: [OMPI users] Performances problems with OpenMPI 5.0.5 and UCX 1.17.0 with Qlogiq infiniband

2024-09-30 Thread Patrick Begou via users
On Sep 30, 2024, at 10:18 AM, Patrick Begou via users wrote: Hi, I'm working on refreshing an old cluster with Almalinux 9 (instead of CentOS6 😕) and building a fresh OpenMPI 5.0.5 environment. I've reached the step where OpenMPI begins to work with ucx 1.17 and Pmix 5.0.3 but

[OMPI users] Performances problems with OpenMPI 5.0.5 and UCX 1.17.0 with Qlogiq infiniband

2024-09-30 Thread Patrick Begou via users
Hi, I'm working on refreshing an old cluster with Almalinux 9 (instead of CentOS6 😕) and building a fresh OpenMPI 5.0.5 environment. I've reached the step where OpenMPI begins to work with ucx 1.17 and Pmix 5.0.3 but not totally. Nodes are using a Qlogic QDR HBA with a managed Qlogic switch (

Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-21 Thread Patrick Begou via users
KfzAgK4Q.PG6VadQJ@univ-grenoble-alpes.fr] Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit : What exactly is the error that is occurring? -- Jeff Squyres jsquy...@cisco.com<mailto:jsquy...@cisco.com> From: users<mailto:users-boun...@lis

Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Patrick Begou via users
occurring? -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Patrick Begou via users Sent: Thursday, June 16, 2022 3:21 AM To: Open MPI Users Cc: Patrick Begou Subject: [OMPI users] OpenMPI and names of the nodes in a cluster Hi all, we are

[OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Patrick Begou via users
Hi all, we are facing a serious problem with OpenMPI (4.0.2) that we have deployed on a cluster. We do not manage this large cluster and the names of the nodes do not agree with Internet standards for protocols: they contain a "_" (underscore) character. So OpenMPI complains about this and d

Re: [OMPI users] Need help for troubleshooting OpenMPI performances

2022-04-07 Thread Patrick Begou via users
e legacy openib btl? If the former, is it built with multi threading support? If the latter, I suggest you give UCX - built with multi threading support - a try and see how it goes Cheers, Gilles On Thu, Mar 24, 2022 at 5:43 PM Patrick Begou via users wrote: Le 28/02/2022 à 17:56, Pa

Re: [OMPI users] Need help for troubleshooting OpenMPI performances

2022-03-24 Thread Patrick Begou via users
Le 28/02/2022 à 17:56, Patrick Begou via users a écrit : Hi, I meet a performance problem with OpenMPI on my cluster. In some situation my parallel code is really slow (same binary running on a different mesh). To investigate, the fortran code code is built with profiling option (mpifort

[OMPI users] Need help for troubleshooting OpenMPI performances

2022-02-28 Thread Patrick Begou via users
Hi, I meet a performance problem with OpenMPI on my cluster. In some situation my parallel code is really slow (same binary running on a different mesh). To investigate, the fortran code code is built with profiling option (mpifort -p -O3.) and launched on 91 cores. One mon.out file pe

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-02-08 Thread Patrick Begou via users
debian, so i can't be much > more help > > if i had to guess totally pulling junk from the air, there's probably > something incompatible with PSM and OPA when running specifically on debian > (likely due to library versioning). i don't know how common that is, so

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-01-27 Thread Patrick Begou via users
ot > sure if it's supposed to stop at some point > > i'm running rhel7, gcc 10.1, openmpi 4.0.5rc2, with-ofi, > without-{psm,ucx,verbs} > > On Tue, Jan 26, 2021 at 3:44 PM Patrick Begou via users > wrote: > > > >

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-26 Thread Patrick Begou via users
MPI app that reproduces > the problem? I can’t think of another way I can give you more help > without being able to see what’s going on. It’s always possible > there’s a bug in the PSM2 MTL but it would be surprising at this point. > > Sent from my iPad > >> On Jan 26, 20

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-26 Thread Patrick Begou via users
Hi all, I ran many tests today. I saw that an older 4.0.2 version of OpenMPI packaged with Nix was running using openib. So I add the --with-verbs option to setup this module. That I can see now is that: mpirun -hostfile $OAR_NODEFILE *--mca mtl psm -mca btl_openib_allow_ib true* - the

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-26 Thread Patrick Begou via users
07 but expect 4007 but it fails too. Patrick Le 25/01/2021 à 19:34, Ralph Castain via users a écrit : > I think you mean add "--mca mtl ofi" to the mpirun cmd line > > >> On Jan 25, 2021, at 10:18 AM, Heinz, Michael William via users >> wrote: >> >>

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-25 Thread Patrick Begou via users
Hi Howard and Michael, thanks for your feedback. I did not want to write a toot long mail with non pertinent information so I just show how the two different builds give different result. I'm using a small test case based on my large code, the same used to show the memory leak with mpi_Alltoallv c

[OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-25 Thread Patrick Begou via users
Hi, I'm trying to deploy OpenMPI 4.0.5 on the university's supercomputer: * Debian GNU/Linux 9 (stretch) * Intel Corporation Omni-Path HFI Silicon 100 Series [discrete] (rev 11) and for several days I have a bug (wrong results using MPI_AllToAllW) on this server when using OmniPath. Running