I think you mean add "--mca mtl ofi" to the mpirun cmd line

> On Jan 25, 2021, at 10:18 AM, Heinz, Michael William via users 
> <users@lists.open-mpi.org> wrote:
> 
> What happens if you specify -mtl ofi ?
> 
> -----Original Message-----
> From: users <users-boun...@lists.open-mpi.org> On Behalf Of Patrick Begou via 
> users
> Sent: Monday, January 25, 2021 12:54 PM
> To: users@lists.open-mpi.org
> Cc: Patrick Begou <patrick.be...@univ-grenoble-alpes.fr>
> Subject: Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path
> 
> Hi Howard and Michael,
> 
> thanks for your feedback. I did not want to write a toot long mail with non 
> pertinent information so I just show how the two different builds give 
> different result. I'm using a small test case based on my large code, the 
> same used to show the memory leak with mpi_Alltoallv calls, but just running 
> 2 iterations. It is a 2D case and data storage is moved from distributions 
> "along X axis" to "along Y axis" with mpi_Alltoallv and subarrays types. 
> Datas initialization is based on the location in the array to allow checking 
> for correct exchanges.
> 
> When the program runs (on 4 processes in my test) it must only show the max 
> rss size of the processes. When it fails it shows the invalid locations. I've 
> drastically reduced the size of the problem with nx=5 and ny=7.
> 
> Launching the non working setup with more details show:
> 
> dahu138 : mpirun -np 4 -mca mtl_base_verbose 99 ./test_layout_array 
> [dahu138:115761] mca: base: components_register: registering framework mtl 
> components [dahu138:115763] mca: base: components_register: registering 
> framework mtl components [dahu138:115763] mca: base: components_register: 
> found loaded component psm2 [dahu138:115763] mca: base: components_register: 
> component psm2 register function successful [dahu138:115763] mca: base: 
> components_open: opening mtl components [dahu138:115763] mca: base: 
> components_open: found loaded component psm2 [dahu138:115761] mca: base: 
> components_register: found loaded component psm2 [dahu138:115763] mca: base: 
> components_open: component psm2 open function successful [dahu138:115761] 
> mca: base: components_register: component psm2 register function successful 
> [dahu138:115761] mca: base: components_open: opening mtl components 
> [dahu138:115761] mca: base: components_open: found loaded component psm2 
> [dahu138:115761] mca: base: components_open: component psm2 open function 
> successful [dahu138:115760] mca: base: components_register: registering 
> framework mtl components [dahu138:115760] mca: base: components_register: 
> found loaded component psm2 [dahu138:115760] mca: base: components_register: 
> component psm2 register function successful [dahu138:115760] mca: base: 
> components_open: opening mtl components [dahu138:115760] mca: base: 
> components_open: found loaded component psm2 [dahu138:115762] mca: base: 
> components_register: registering framework mtl components [dahu138:115762] 
> mca: base: components_register: found loaded component psm2 [dahu138:115760] 
> mca: base: components_open: component psm2 open function successful 
> [dahu138:115762] mca: base: components_register: component psm2 register 
> function successful [dahu138:115762] mca: base: components_open: opening mtl 
> components [dahu138:115762] mca: base: components_open: found loaded 
> component psm2 [dahu138:115762] mca: base: components_open: component psm2 
> open function successful [dahu138:115760] mca:base:select: Auto-selecting mtl 
> components [dahu138:115760] mca:base:select:(  mtl) Querying component [psm2] 
> [dahu138:115760] mca:base:select:(  mtl) Query of component [psm2] set 
> priority to 40 [dahu138:115761] mca:base:select: Auto-selecting mtl 
> components [dahu138:115762] mca:base:select: Auto-selecting mtl components 
> [dahu138:115762] mca:base:select:(  mtl) Querying component [psm2] 
> [dahu138:115762] mca:base:select:(  mtl) Query of component [psm2] set 
> priority to 40 [dahu138:115762] mca:base:select:(  mtl) Selected component 
> [psm2] [dahu138:115762] select: initializing mtl component psm2 
> [dahu138:115761] mca:base:select:(  mtl) Querying component [psm2] 
> [dahu138:115761] mca:base:select:(  mtl) Query of component [psm2] set 
> priority to 40 [dahu138:115761] mca:base:select:(  mtl) Selected component 
> [psm2] [dahu138:115761] select: initializing mtl component psm2 
> [dahu138:115760] mca:base:select:(  mtl) Selected component [psm2] 
> [dahu138:115760] select: initializing mtl component psm2 [dahu138:115763] 
> mca:base:select: Auto-selecting mtl components [dahu138:115763] 
> mca:base:select:(  mtl) Querying component [psm2] [dahu138:115763] 
> mca:base:select:(  mtl) Query of component [psm2] set priority to 40 
> [dahu138:115763] mca:base:select:(  mtl) Selected component [psm2] 
> [dahu138:115763] select: initializing mtl component psm2 [dahu138:115761] 
> select: init returned success [dahu138:115761] select: component psm2 
> selected [dahu138:115762] select: init returned success [dahu138:115762] 
> select: component psm2 selected [dahu138:115763] select: init returned 
> success [dahu138:115763] select: component psm2 selected [dahu138:115760] 
> select: init returned success [dahu138:115760] select: component psm2 
> selected On 1 found 1007 but expect 3007 On 2 found 1007 but expect 4007
> 
> and with this setup the code freeze with this dimension of the problem.
> 
> 
> Below is the same code with my no-ib setup of openMPI on the same node:
> 
> dahu138 : mpirun -np 4 -mca mtl_base_verbose 99 ./test_layout_array 
> [dahu138:116723] mca: base: components_register: registering framework mtl 
> components [dahu138:116723] mca: base: components_open: opening mtl 
> components [dahu138:116724] mca: base: components_register: registering 
> framework mtl components [dahu138:116724] mca: base: components_open: opening 
> mtl components [dahu138:116726] mca: base: components_register: registering 
> framework mtl components [dahu138:116726] mca: base: components_open: opening 
> mtl components [dahu138:116725] mca: base: components_register: registering 
> framework mtl components [dahu138:116725] mca: base: components_open: opening 
> mtl components [INFO MEMORY] : processor 0 uses  9948 kb max of resident 
> memory [INFO MEMORY] : processor 0 uses  9948 kb max of resident memory
> 
> The test case used is provides in attachment but as it runs on many 
> OS/OpenMPI/hardware associations I do not think the problem could be the 
> tes-case even if it is also a possibility.
> 
> Patrick
> 


Reply via email to