I think you mean add "--mca mtl ofi" to the mpirun cmd line
> On Jan 25, 2021, at 10:18 AM, Heinz, Michael William via users > <users@lists.open-mpi.org> wrote: > > What happens if you specify -mtl ofi ? > > -----Original Message----- > From: users <users-boun...@lists.open-mpi.org> On Behalf Of Patrick Begou via > users > Sent: Monday, January 25, 2021 12:54 PM > To: users@lists.open-mpi.org > Cc: Patrick Begou <patrick.be...@univ-grenoble-alpes.fr> > Subject: Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path > > Hi Howard and Michael, > > thanks for your feedback. I did not want to write a toot long mail with non > pertinent information so I just show how the two different builds give > different result. I'm using a small test case based on my large code, the > same used to show the memory leak with mpi_Alltoallv calls, but just running > 2 iterations. It is a 2D case and data storage is moved from distributions > "along X axis" to "along Y axis" with mpi_Alltoallv and subarrays types. > Datas initialization is based on the location in the array to allow checking > for correct exchanges. > > When the program runs (on 4 processes in my test) it must only show the max > rss size of the processes. When it fails it shows the invalid locations. I've > drastically reduced the size of the problem with nx=5 and ny=7. > > Launching the non working setup with more details show: > > dahu138 : mpirun -np 4 -mca mtl_base_verbose 99 ./test_layout_array > [dahu138:115761] mca: base: components_register: registering framework mtl > components [dahu138:115763] mca: base: components_register: registering > framework mtl components [dahu138:115763] mca: base: components_register: > found loaded component psm2 [dahu138:115763] mca: base: components_register: > component psm2 register function successful [dahu138:115763] mca: base: > components_open: opening mtl components [dahu138:115763] mca: base: > components_open: found loaded component psm2 [dahu138:115761] mca: base: > components_register: found loaded component psm2 [dahu138:115763] mca: base: > components_open: component psm2 open function successful [dahu138:115761] > mca: base: components_register: component psm2 register function successful > [dahu138:115761] mca: base: components_open: opening mtl components > [dahu138:115761] mca: base: components_open: found loaded component psm2 > [dahu138:115761] mca: base: components_open: component psm2 open function > successful [dahu138:115760] mca: base: components_register: registering > framework mtl components [dahu138:115760] mca: base: components_register: > found loaded component psm2 [dahu138:115760] mca: base: components_register: > component psm2 register function successful [dahu138:115760] mca: base: > components_open: opening mtl components [dahu138:115760] mca: base: > components_open: found loaded component psm2 [dahu138:115762] mca: base: > components_register: registering framework mtl components [dahu138:115762] > mca: base: components_register: found loaded component psm2 [dahu138:115760] > mca: base: components_open: component psm2 open function successful > [dahu138:115762] mca: base: components_register: component psm2 register > function successful [dahu138:115762] mca: base: components_open: opening mtl > components [dahu138:115762] mca: base: components_open: found loaded > component psm2 [dahu138:115762] mca: base: components_open: component psm2 > open function successful [dahu138:115760] mca:base:select: Auto-selecting mtl > components [dahu138:115760] mca:base:select:( mtl) Querying component [psm2] > [dahu138:115760] mca:base:select:( mtl) Query of component [psm2] set > priority to 40 [dahu138:115761] mca:base:select: Auto-selecting mtl > components [dahu138:115762] mca:base:select: Auto-selecting mtl components > [dahu138:115762] mca:base:select:( mtl) Querying component [psm2] > [dahu138:115762] mca:base:select:( mtl) Query of component [psm2] set > priority to 40 [dahu138:115762] mca:base:select:( mtl) Selected component > [psm2] [dahu138:115762] select: initializing mtl component psm2 > [dahu138:115761] mca:base:select:( mtl) Querying component [psm2] > [dahu138:115761] mca:base:select:( mtl) Query of component [psm2] set > priority to 40 [dahu138:115761] mca:base:select:( mtl) Selected component > [psm2] [dahu138:115761] select: initializing mtl component psm2 > [dahu138:115760] mca:base:select:( mtl) Selected component [psm2] > [dahu138:115760] select: initializing mtl component psm2 [dahu138:115763] > mca:base:select: Auto-selecting mtl components [dahu138:115763] > mca:base:select:( mtl) Querying component [psm2] [dahu138:115763] > mca:base:select:( mtl) Query of component [psm2] set priority to 40 > [dahu138:115763] mca:base:select:( mtl) Selected component [psm2] > [dahu138:115763] select: initializing mtl component psm2 [dahu138:115761] > select: init returned success [dahu138:115761] select: component psm2 > selected [dahu138:115762] select: init returned success [dahu138:115762] > select: component psm2 selected [dahu138:115763] select: init returned > success [dahu138:115763] select: component psm2 selected [dahu138:115760] > select: init returned success [dahu138:115760] select: component psm2 > selected On 1 found 1007 but expect 3007 On 2 found 1007 but expect 4007 > > and with this setup the code freeze with this dimension of the problem. > > > Below is the same code with my no-ib setup of openMPI on the same node: > > dahu138 : mpirun -np 4 -mca mtl_base_verbose 99 ./test_layout_array > [dahu138:116723] mca: base: components_register: registering framework mtl > components [dahu138:116723] mca: base: components_open: opening mtl > components [dahu138:116724] mca: base: components_register: registering > framework mtl components [dahu138:116724] mca: base: components_open: opening > mtl components [dahu138:116726] mca: base: components_register: registering > framework mtl components [dahu138:116726] mca: base: components_open: opening > mtl components [dahu138:116725] mca: base: components_register: registering > framework mtl components [dahu138:116725] mca: base: components_open: opening > mtl components [INFO MEMORY] : processor 0 uses 9948 kb max of resident > memory [INFO MEMORY] : processor 0 uses 9948 kb max of resident memory > > The test case used is provides in attachment but as it runs on many > OS/OpenMPI/hardware associations I do not think the problem could be the > tes-case even if it is also a possibility. > > Patrick >