That's not for the MPI communications but for the process management part
(PRRTE/PMIX). If forcing the PTL to `lo` worked it mostly indicates that
the shared memory in OMPI was able to be set up correctly.

  George.


On Mon, Feb 5, 2024 at 3:47 PM John Hearns <hear...@gmail.com> wrote:

> Stupid question... Why is it going 'out' to the loopback address? Is
> shared memory not being used these days?
>
> On Mon, Feb 5, 2024, 8:31 PM John Haiducek via users <
> users@lists.open-mpi.org> wrote:
>
>> Adding '--pmixmca ptl_tcp_if_include lo0' to the mpirun argument list
>> seems to fix (or at least work around) the problem.
>>
>> On Mon, Feb 5, 2024 at 1:49 PM John Haiducek <jhaid...@gmail.com> wrote:
>>
>>> Thanks, George, that issue you linked certainly looks potentially
>>> related.
>>>
>>> Output from ompi_info:
>>>
>>>                  Package: Open MPI brew@Monterey-arm64.local
>>> Distribution
>>>                 Open MPI: 5.0.1
>>>   Open MPI repo revision: v5.0.1
>>>    Open MPI release date: Dec 20, 2023
>>>                  MPI API: 3.1.0
>>>             Ident string: 5.0.1
>>>                   Prefix: /opt/homebrew/Cellar/open-mpi/5.0.1
>>>  Configured architecture: aarch64-apple-darwin21.6.0
>>>            Configured by: brew
>>>            Configured on: Wed Dec 20 22:18:10 UTC 2023
>>>           Configure host: Monterey-arm64.local
>>>   Configure command line: '--disable-debug'
>>> '--disable-dependency-tracking'
>>>                           '--prefix=/opt/homebrew/Cellar/open-mpi/5.0.1'
>>>
>>> '--libdir=/opt/homebrew/Cellar/open-mpi/5.0.1/lib'
>>>                           '--disable-silent-rules' '--enable-ipv6'
>>>                           '--enable-mca-no-build=reachable-netlink'
>>>                           '--sysconfdir=/opt/homebrew/etc'
>>>                           '--with-hwloc=/opt/homebrew/opt/hwloc'
>>>                           '--with-libevent=/opt/homebrew/opt/libevent'
>>>                           '--with-pmix=/opt/homebrew/opt/pmix'
>>> '--with-sge'
>>>                 Built by: brew
>>>                 Built on: Wed Dec 20 22:18:10 UTC 2023
>>>               Built host: Monterey-arm64.local
>>>               C bindings: yes
>>>              Fort mpif.h: yes (single underscore)
>>>             Fort use mpi: yes (full: ignore TKR)
>>>        Fort use mpi size: deprecated-ompi-info-value
>>>         Fort use mpi_f08: yes
>>>  Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
>>>                           limitations in the gfortran compiler and/or
>>> Open
>>>                           MPI, does not support the following: array
>>>                           subsections, direct passthru (where possible)
>>> to
>>>                           underlying Open MPI's C functionality
>>>   Fort mpi_f08 subarrays: no
>>>            Java bindings: no
>>>   Wrapper compiler rpath: unnecessary
>>>               C compiler: clang
>>>      C compiler absolute: clang
>>>   C compiler family name: CLANG
>>>       C compiler version: 14.0.0 (clang-1400.0.29.202)
>>>             C++ compiler: clang++
>>>    C++ compiler absolute: clang++
>>>            Fort compiler: gfortran
>>>        Fort compiler abs: /opt/homebrew/opt/gcc/bin/gfortran
>>>          Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::)
>>>    Fort 08 assumed shape: yes
>>>       Fort optional args: yes
>>>           Fort INTERFACE: yes
>>>     Fort ISO_FORTRAN_ENV: yes
>>>        Fort STORAGE_SIZE: yes
>>>       Fort BIND(C) (all): yes
>>>       Fort ISO_C_BINDING: yes
>>>  Fort SUBROUTINE BIND(C): yes
>>>        Fort TYPE,BIND(C): yes
>>>  Fort T,BIND(C,name="a"): yes
>>>             Fort PRIVATE: yes
>>>            Fort ABSTRACT: yes
>>>        Fort ASYNCHRONOUS: yes
>>>           Fort PROCEDURE: yes
>>>          Fort USE...ONLY: yes
>>>            Fort C_FUNLOC: yes
>>>  Fort f08 using wrappers: yes
>>>          Fort MPI_SIZEOF: yes
>>>              C profiling: yes
>>>    Fort mpif.h profiling: yes
>>>   Fort use mpi profiling: yes
>>>    Fort use mpi_f08 prof: yes
>>>           Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support:
>>> yes,
>>>                           OMPI progress: no, Event lib: yes)
>>>            Sparse Groups: no
>>>   Internal debug support: no
>>>   MPI interface warnings: yes
>>>      MPI parameter check: runtime
>>> Memory profiling support: no
>>> Memory debugging support: no
>>>               dl support: yes
>>>    Heterogeneous support: no
>>>        MPI_WTIME support: native
>>>      Symbol vis. support: yes
>>>    Host topology support: yes
>>>             IPv6 support: yes
>>>           MPI extensions: affinity, cuda, ftmpi, rocm, shortfloat
>>>  Fault Tolerance support: yes
>>>           FT MPI support: yes
>>>   MPI_MAX_PROCESSOR_NAME: 256
>>>     MPI_MAX_ERROR_STRING: 256
>>>      MPI_MAX_OBJECT_NAME: 64
>>>         MPI_MAX_INFO_KEY: 36
>>>         MPI_MAX_INFO_VAL: 256
>>>        MPI_MAX_PORT_NAME: 1024
>>>   MPI_MAX_DATAREP_STRING: 128
>>>          MCA accelerator: null (MCA v2.1.0, API v1.0.0, Component v5.0.1)
>>>            MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component
>>> v5.0.1)
>>>            MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component
>>> v5.0.1)
>>>            MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component
>>> v5.0.1)
>>>                  MCA btl: self (MCA v2.1.0, API v3.3.0, Component v5.0.1)
>>>                  MCA btl: sm (MCA v2.1.0, API v3.3.0, Component v5.0.1)
>>>                  MCA btl: tcp (MCA v2.1.0, API v3.3.0, Component v5.0.1)
>>>                   MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component
>>> v5.0.1)
>>>                   MCA if: bsdx_ipv6 (MCA v2.1.0, API v2.0.0, Component
>>>                           v5.0.1)
>>>                   MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
>>>                           v5.0.1)
>>>          MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v5.0.1)
>>>          MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component
>>> v5.0.1)
>>>                MCA mpool: hugepage (MCA v2.1.0, API v3.1.0, Component
>>> v5.0.1)
>>>              MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
>>>                           v5.0.1)
>>>               MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component
>>> v5.0.1)
>>>            MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component
>>> v5.0.1)
>>>                MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v5.0.1)
>>>                MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component
>>> v5.0.1)
>>>                MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v5.0.1)
>>>              MCA threads: pthreads (MCA v2.1.0, API v1.0.0, Component
>>> v5.0.1)
>>>                MCA timer: darwin (MCA v2.1.0, API v2.0.0, Component
>>> v5.0.1)
>>>                  MCA bml: r2 (MCA v2.1.0, API v2.1.0, Component v5.0.1)
>>>                 MCA coll: adapt (MCA v2.1.0, API v2.4.0, Component
>>> v5.0.1)
>>>                 MCA coll: basic (MCA v2.1.0, API v2.4.0, Component
>>> v5.0.1)
>>>                 MCA coll: han (MCA v2.1.0, API v2.4.0, Component v5.0.1)
>>>                 MCA coll: inter (MCA v2.1.0, API v2.4.0, Component
>>> v5.0.1)
>>>                 MCA coll: libnbc (MCA v2.1.0, API v2.4.0, Component
>>> v5.0.1)
>>>                 MCA coll: self (MCA v2.1.0, API v2.4.0, Component v5.0.1)
>>>                 MCA coll: sync (MCA v2.1.0, API v2.4.0, Component v5.0.1)
>>>                 MCA coll: tuned (MCA v2.1.0, API v2.4.0, Component
>>> v5.0.1)
>>>                 MCA coll: ftagree (MCA v2.1.0, API v2.4.0, Component
>>> v5.0.1)
>>>                 MCA coll: monitoring (MCA v2.1.0, API v2.4.0, Component
>>>                           v5.0.1)
>>>                 MCA coll: sm (MCA v2.1.0, API v2.4.0, Component v5.0.1)
>>>                 MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component
>>> v5.0.1)
>>>                MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component
>>> v5.0.1)
>>>                MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
>>>                           v5.0.1)
>>>                MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
>>>                           v5.0.1)
>>>                MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component
>>> v5.0.1)
>>>                   MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v5.0.1)
>>>                 MCA hook: comm_method (MCA v2.1.0, API v1.0.0, Component
>>>                           v5.0.1)
>>>                   MCA io: ompio (MCA v2.1.0, API v2.0.0, Component
>>> v5.0.1)
>>>                   MCA io: romio341 (MCA v2.1.0, API v2.0.0, Component
>>> v5.0.1)
>>>                  MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v5.0.1)
>>>                  MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component
>>>                           v5.0.1)
>>>                  MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v5.0.1)
>>>                 MCA part: persist (MCA v2.1.0, API v4.0.0, Component
>>> v5.0.1)
>>>                  MCA pml: cm (MCA v2.1.0, API v2.1.0, Component v5.0.1)
>>>                  MCA pml: monitoring (MCA v2.1.0, API v2.1.0, Component
>>>                           v5.0.1)
>>>                  MCA pml: ob1 (MCA v2.1.0, API v2.1.0, Component v5.0.1)
>>>                  MCA pml: v (MCA v2.1.0, API v2.1.0, Component v5.0.1)
>>>             MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
>>>                           v5.0.1)
>>>             MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
>>>                           v5.0.1)
>>>             MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v5.0.1)
>>>                 MCA topo: basic (MCA v2.1.0, API v2.2.0, Component
>>> v5.0.1)
>>>                 MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
>>>                           v5.0.1)
>>>            MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
>>>                           v5.0.1)
>>>
>>> On Mon, Feb 5, 2024 at 12:48 PM George Bosilca <bosi...@icl.utk.edu>
>>> wrote:
>>>
>>>> OMPI seems unable to create a communication medium between your
>>>> processes. There are few known issues on OSX, please read
>>>> https://github.com/open-mpi/ompi/issues/12273 for more info.
>>>>
>>>> Can you provide the header of the ompi_info command. What I'm
>>>> interested on is the part about `Configure command line:`
>>>>
>>>> George.
>>>>
>>>>
>>>> On Mon, Feb 5, 2024 at 12:18 PM John Haiducek via users <
>>>> users@lists.open-mpi.org> wrote:
>>>>
>>>>> I'm having problems running programs compiled against the OpenMPI
>>>>> 5.0.1 package provided by homebrew on MacOS (arm) 12.6.1.
>>>>>
>>>>> When running a Fortran test program that simply calls MPI_init
>>>>> followed by MPI_Finalize, I get the following output:
>>>>>
>>>>> $ mpirun -n 2 ./mpi_init_test
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>>>> likely to abort.  There are many reasons that a parallel process can
>>>>> fail during MPI_INIT; some of which are due to configuration or
>>>>> environment
>>>>> problems.  This failure appears to be an internal failure; here's some
>>>>> additional information (which may only be relevant to an Open MPI
>>>>> developer):
>>>>>
>>>>>   PML add procs failed
>>>>>   --> Returned "Not found" (-13) instead of "Success" (0)
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>>>> likely to abort.  There are many reasons that a parallel process can
>>>>> fail during MPI_INIT; some of which are due to configuration or
>>>>> environment
>>>>> problems.  This failure appears to be an internal failure; here's some
>>>>> additional information (which may only be relevant to an Open MPI
>>>>> developer):
>>>>>
>>>>>   ompi_mpi_init: ompi_mpi_instance_init failed
>>>>>   --> Returned "Not found" (-13) instead of "Success" (0)
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> [haiducek-lt:00000] *** An error occurred in MPI_Init
>>>>> [haiducek-lt:00000] *** reported by process [1905590273,1]
>>>>> [haiducek-lt:00000] *** on a NULL communicator
>>>>> [haiducek-lt:00000] *** Unknown error
>>>>> [haiducek-lt:00000] *** MPI_ERRORS_ARE_FATAL (processes in this
>>>>> communicator will now abort,
>>>>> [haiducek-lt:00000] ***    and MPI will try to terminate your MPI job
>>>>> as well)
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> prterun detected that one or more processes exited with non-zero
>>>>> status,
>>>>> thus causing the job to be terminated. The first process to do so was:
>>>>>
>>>>>    Process name: [prterun-haiducek-lt-15584@1,1] Exit code:    14
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> I'm not sure whether this is the result of a bug in OpenMPI, in the
>>>>> homebrew package, or a misconfiguration of my system. Any suggestions for
>>>>> troubleshooting this?
>>>>>
>>>>

Reply via email to