Hi Ralph and Michael,

ofi is not available with my default build on this host (I just set
--without-verbs)

    dahu112 : mpirun -np 4 --mca mtl ofi ./test_layout_array
    --------------------------------------------------------------------------
    A requested component was not found, or was unable to be opened.  This
    means that this component is either not installed or is unable to be
    used on your system (e.g., sometimes this means that shared libraries
    that the component requires are unable to be found/loaded).  Note that
    Open MPI stopped checking at the first component that it did not find.

    Host:      dahu112
    Framework: mtl
    Component: ofi

Now, I've compiled OpenMPI 4.0.5 with (not sure it is correct):

    CC=$(which gcc) CXX=$(which g++) FC=$(which gfortran) ../configure
    --with-hwloc --enable-mpirun-prefix-by-default \
    --prefix=/bettik/begou/OpenMPI405-ofi --enable-mpi1-compatibility \
    --enable-mpi-cxx --enable-cxx-exceptions --without-verbs*--with-ofi
    *--without-psm --without-psm2 \
    --without-slurm


And launched:

    dahu34 : mpirun --mca mtl ofi -np 4 -mca mtl_base_verbose 99 
    ./test_layout_array
    [dahu34:44662] mca: base: components_register: registering framework
    mtl components
    [dahu34:44662] mca: base: components_register: found loaded
    component ofi
    [dahu34:44662] mca: base: components_register: component ofi
    register function successful
    [dahu34:44662] mca: base: components_open: opening mtl components
    [dahu34:44662] mca: base: components_open: found loaded component ofi
    [dahu34:44662] mca: base: components_open: component ofi open
    function successful
    [dahu34:44663] mca: base: components_register: registering framework
    mtl components
    [dahu34:44663] mca: base: components_register: found loaded
    component ofi
    [dahu34:44663] mca: base: components_register: component ofi
    register function successful
    [dahu34:44663] mca: base: components_open: opening mtl components
    [dahu34:44663] mca: base: components_open: found loaded component ofi
    [dahu34:44663] mca: base: components_open: component ofi open
    function successful
    [dahu34:44665] mca: base: components_register: registering framework
    mtl components
    [dahu34:44665] mca: base: components_register: found loaded
    component ofi
    [dahu34:44665] mca: base: components_register: component ofi
    register function successful
    [dahu34:44665] mca: base: components_open: opening mtl components
    [dahu34:44665] mca: base: components_open: found loaded component ofi
    [dahu34:44665] mca: base: components_open: component ofi open
    function successful
    [dahu34:44664] mca: base: components_register: registering framework
    mtl components
    [dahu34:44664] mca: base: components_register: found loaded
    component ofi
    [dahu34:44664] mca: base: components_register: component ofi
    register function successful
    [dahu34:44664] mca: base: components_open: opening mtl components
    [dahu34:44664] mca: base: components_open: found loaded component ofi
    [dahu34:44664] mca: base: components_open: component ofi open
    function successful
    [dahu34:44662] mca:base:select: Auto-selecting mtl components
    [dahu34:44662] mca:base:select:(  mtl) Querying component [ofi]
    [dahu34:44662] mca:base:select:(  mtl) Query of component [ofi] set
    priority to 25
    [dahu34:44662] mca:base:select:(  mtl) Selected component [ofi]
    [dahu34:44662] select: initializing mtl component ofi
    [dahu34:44664] mca:base:select: Auto-selecting mtl components
    [dahu34:44665] mca:base:select: Auto-selecting mtl components
    [dahu34:44665] mca:base:select:(  mtl) Querying component [ofi]
    [dahu34:44665] mca:base:select:(  mtl) Query of component [ofi] set
    priority to 25
    [dahu34:44664] mca:base:select:(  mtl) Querying component [ofi]
    [dahu34:44665] mca:base:select:(  mtl) Selected component [ofi]
    [dahu34:44665] select: initializing mtl component ofi
    [dahu34:44664] mca:base:select:(  mtl) Query of component [ofi] set
    priority to 25
    [dahu34:44664] mca:base:select:(  mtl) Selected component [ofi]
    [dahu34:44663] mca:base:select: Auto-selecting mtl components
    [dahu34:44664] select: initializing mtl component ofi
    [dahu34:44663] mca:base:select:(  mtl) Querying component [ofi]
    [dahu34:44663] mca:base:select:(  mtl) Query of component [ofi] set
    priority to 25
    [dahu34:44663] mca:base:select:(  mtl) Selected component [ofi]
    [dahu34:44663] select: initializing mtl component ofi
    [dahu34:44662]
    ../../../../../ompi/mca/mtl/ofi/mtl_ofi_component.c:315:
    mtl:ofi:provider_include = "(null)"
    [dahu34:44662]
    ../../../../../ompi/mca/mtl/ofi/mtl_ofi_component.c:318:
    mtl:ofi:provider_exclude = "shm,sockets,tcp,udp,rstream"
    [dahu34:44662]
    ../../../../../ompi/mca/mtl/ofi/mtl_ofi_component.c:347:
    mtl:ofi:prov: psm2
    [dahu34:44665]
    ../../../../../ompi/mca/mtl/ofi/mtl_ofi_component.c:315:
    mtl:ofi:provider_include = "(null)"
    [dahu34:44665]
    ../../../../../ompi/mca/mtl/ofi/mtl_ofi_component.c:318:
    mtl:ofi:provider_exclude = "shm,sockets,tcp,udp,rstream"
    [dahu34:44665]
    ../../../../../ompi/mca/mtl/ofi/mtl_ofi_component.c:347:
    mtl:ofi:prov: psm2
    [dahu34:44663]
    ../../../../../ompi/mca/mtl/ofi/mtl_ofi_component.c:315:
    mtl:ofi:provider_include = "(null)"
    [dahu34:44663]
    ../../../../../ompi/mca/mtl/ofi/mtl_ofi_component.c:318:
    mtl:ofi:provider_exclude = "shm,sockets,tcp,udp,rstream"
    [dahu34:44663]
    ../../../../../ompi/mca/mtl/ofi/mtl_ofi_component.c:347:
    mtl:ofi:prov: psm2
    [dahu34:44664]
    ../../../../../ompi/mca/mtl/ofi/mtl_ofi_component.c:315:
    mtl:ofi:provider_include = "(null)"
    [dahu34:44664]
    ../../../../../ompi/mca/mtl/ofi/mtl_ofi_component.c:318:
    mtl:ofi:provider_exclude = "shm,sockets,tcp,udp,rstream"
    [dahu34:44664]
    ../../../../../ompi/mca/mtl/ofi/mtl_ofi_component.c:347:
    mtl:ofi:prov: psm2
    [dahu34:44665] select: init returned success
    [dahu34:44662] select: init returned success
    [dahu34:44662] select: component ofi selected
    [dahu34:44665] select: component ofi selected
    [dahu34:44663] select: init returned success
    [dahu34:44663] select: component ofi selected
    [dahu34:44664] select: init returned success
    [dahu34:44664] select: component ofi selected
    On 1 found 1007 but expect 3007
    On 2 found 1007 but expect 4007


but it fails too.

Patrick

Le 25/01/2021 à 19:34, Ralph Castain via users a écrit :
> I think you mean add "--mca mtl ofi" to the mpirun cmd line
>
>
>> On Jan 25, 2021, at 10:18 AM, Heinz, Michael William via users 
>> <users@lists.open-mpi.org> wrote:
>>
>> What happens if you specify -mtl ofi ?
>>
>> -----Original Message-----
>> From: users <users-boun...@lists.open-mpi.org> On Behalf Of Patrick Begou 
>> via users
>> Sent: Monday, January 25, 2021 12:54 PM
>> To: users@lists.open-mpi.org
>> Cc: Patrick Begou <patrick.be...@univ-grenoble-alpes.fr>
>> Subject: Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path
>>
>> Hi Howard and Michael,
>>
>> thanks for your feedback. I did not want to write a toot long mail with non 
>> pertinent information so I just show how the two different builds give 
>> different result. I'm using a small test case based on my large code, the 
>> same used to show the memory leak with mpi_Alltoallv calls, but just running 
>> 2 iterations. It is a 2D case and data storage is moved from distributions 
>> "along X axis" to "along Y axis" with mpi_Alltoallv and subarrays types. 
>> Datas initialization is based on the location in the array to allow checking 
>> for correct exchanges.
>>
>> When the program runs (on 4 processes in my test) it must only show the max 
>> rss size of the processes. When it fails it shows the invalid locations. 
>> I've drastically reduced the size of the problem with nx=5 and ny=7.
>>
>> Launching the non working setup with more details show:
>>
>> dahu138 : mpirun -np 4 -mca mtl_base_verbose 99 ./test_layout_array 
>> [dahu138:115761] mca: base: components_register: registering framework mtl 
>> components [dahu138:115763] mca: base: components_register: registering 
>> framework mtl components [dahu138:115763] mca: base: components_register: 
>> found loaded component psm2 [dahu138:115763] mca: base: components_register: 
>> component psm2 register function successful [dahu138:115763] mca: base: 
>> components_open: opening mtl components [dahu138:115763] mca: base: 
>> components_open: found loaded component psm2 [dahu138:115761] mca: base: 
>> components_register: found loaded component psm2 [dahu138:115763] mca: base: 
>> components_open: component psm2 open function successful [dahu138:115761] 
>> mca: base: components_register: component psm2 register function successful 
>> [dahu138:115761] mca: base: components_open: opening mtl components 
>> [dahu138:115761] mca: base: components_open: found loaded component psm2 
>> [dahu138:115761] mca: base: components_open: component psm2 open function 
>> successful [dahu138:115760] mca: base: components_register: registering 
>> framework mtl components [dahu138:115760] mca: base: components_register: 
>> found loaded component psm2 [dahu138:115760] mca: base: components_register: 
>> component psm2 register function successful [dahu138:115760] mca: base: 
>> components_open: opening mtl components [dahu138:115760] mca: base: 
>> components_open: found loaded component psm2 [dahu138:115762] mca: base: 
>> components_register: registering framework mtl components [dahu138:115762] 
>> mca: base: components_register: found loaded component psm2 [dahu138:115760] 
>> mca: base: components_open: component psm2 open function successful 
>> [dahu138:115762] mca: base: components_register: component psm2 register 
>> function successful [dahu138:115762] mca: base: components_open: opening mtl 
>> components [dahu138:115762] mca: base: components_open: found loaded 
>> component psm2 [dahu138:115762] mca: base: components_open: component psm2 
>> open function successful [dahu138:115760] mca:base:select: Auto-selecting 
>> mtl components [dahu138:115760] mca:base:select:(  mtl) Querying component 
>> [psm2] [dahu138:115760] mca:base:select:(  mtl) Query of component [psm2] 
>> set priority to 40 [dahu138:115761] mca:base:select: Auto-selecting mtl 
>> components [dahu138:115762] mca:base:select: Auto-selecting mtl components 
>> [dahu138:115762] mca:base:select:(  mtl) Querying component [psm2] 
>> [dahu138:115762] mca:base:select:(  mtl) Query of component [psm2] set 
>> priority to 40 [dahu138:115762] mca:base:select:(  mtl) Selected component 
>> [psm2] [dahu138:115762] select: initializing mtl component psm2 
>> [dahu138:115761] mca:base:select:(  mtl) Querying component [psm2] 
>> [dahu138:115761] mca:base:select:(  mtl) Query of component [psm2] set 
>> priority to 40 [dahu138:115761] mca:base:select:(  mtl) Selected component 
>> [psm2] [dahu138:115761] select: initializing mtl component psm2 
>> [dahu138:115760] mca:base:select:(  mtl) Selected component [psm2] 
>> [dahu138:115760] select: initializing mtl component psm2 [dahu138:115763] 
>> mca:base:select: Auto-selecting mtl components [dahu138:115763] 
>> mca:base:select:(  mtl) Querying component [psm2] [dahu138:115763] 
>> mca:base:select:(  mtl) Query of component [psm2] set priority to 40 
>> [dahu138:115763] mca:base:select:(  mtl) Selected component [psm2] 
>> [dahu138:115763] select: initializing mtl component psm2 [dahu138:115761] 
>> select: init returned success [dahu138:115761] select: component psm2 
>> selected [dahu138:115762] select: init returned success [dahu138:115762] 
>> select: component psm2 selected [dahu138:115763] select: init returned 
>> success [dahu138:115763] select: component psm2 selected [dahu138:115760] 
>> select: init returned success [dahu138:115760] select: component psm2 
>> selected On 1 found 1007 but expect 3007 On 2 found 1007 but expect 4007
>>
>> and with this setup the code freeze with this dimension of the problem.
>>
>>
>> Below is the same code with my no-ib setup of openMPI on the same node:
>>
>> dahu138 : mpirun -np 4 -mca mtl_base_verbose 99 ./test_layout_array 
>> [dahu138:116723] mca: base: components_register: registering framework mtl 
>> components [dahu138:116723] mca: base: components_open: opening mtl 
>> components [dahu138:116724] mca: base: components_register: registering 
>> framework mtl components [dahu138:116724] mca: base: components_open: 
>> opening mtl components [dahu138:116726] mca: base: components_register: 
>> registering framework mtl components [dahu138:116726] mca: base: 
>> components_open: opening mtl components [dahu138:116725] mca: base: 
>> components_register: registering framework mtl components [dahu138:116725] 
>> mca: base: components_open: opening mtl components [INFO MEMORY] : processor 
>> 0 uses  9948 kb max of resident memory [INFO MEMORY] : processor 0 uses  
>> 9948 kb max of resident memory
>>
>> The test case used is provides in attachment but as it runs on many 
>> OS/OpenMPI/hardware associations I do not think the problem could be the 
>> tes-case even if it is also a possibility.
>>
>> Patrick
>>
>

Reply via email to