That's not for the MPI communications but for the process management part (PRRTE/PMIX). If forcing the PTL to `lo` worked it mostly indicates that the shared memory in OMPI was able to be set up correctly.
George. On Mon, Feb 5, 2024 at 3:47 PM John Hearns <hear...@gmail.com> wrote: > Stupid question... Why is it going 'out' to the loopback address? Is > shared memory not being used these days? > > On Mon, Feb 5, 2024, 8:31 PM John Haiducek via users < > users@lists.open-mpi.org> wrote: > >> Adding '--pmixmca ptl_tcp_if_include lo0' to the mpirun argument list >> seems to fix (or at least work around) the problem. >> >> On Mon, Feb 5, 2024 at 1:49 PM John Haiducek <jhaid...@gmail.com> wrote: >> >>> Thanks, George, that issue you linked certainly looks potentially >>> related. >>> >>> Output from ompi_info: >>> >>> Package: Open MPI brew@Monterey-arm64.local >>> Distribution >>> Open MPI: 5.0.1 >>> Open MPI repo revision: v5.0.1 >>> Open MPI release date: Dec 20, 2023 >>> MPI API: 3.1.0 >>> Ident string: 5.0.1 >>> Prefix: /opt/homebrew/Cellar/open-mpi/5.0.1 >>> Configured architecture: aarch64-apple-darwin21.6.0 >>> Configured by: brew >>> Configured on: Wed Dec 20 22:18:10 UTC 2023 >>> Configure host: Monterey-arm64.local >>> Configure command line: '--disable-debug' >>> '--disable-dependency-tracking' >>> '--prefix=/opt/homebrew/Cellar/open-mpi/5.0.1' >>> >>> '--libdir=/opt/homebrew/Cellar/open-mpi/5.0.1/lib' >>> '--disable-silent-rules' '--enable-ipv6' >>> '--enable-mca-no-build=reachable-netlink' >>> '--sysconfdir=/opt/homebrew/etc' >>> '--with-hwloc=/opt/homebrew/opt/hwloc' >>> '--with-libevent=/opt/homebrew/opt/libevent' >>> '--with-pmix=/opt/homebrew/opt/pmix' >>> '--with-sge' >>> Built by: brew >>> Built on: Wed Dec 20 22:18:10 UTC 2023 >>> Built host: Monterey-arm64.local >>> C bindings: yes >>> Fort mpif.h: yes (single underscore) >>> Fort use mpi: yes (full: ignore TKR) >>> Fort use mpi size: deprecated-ompi-info-value >>> Fort use mpi_f08: yes >>> Fort mpi_f08 compliance: The mpi_f08 module is available, but due to >>> limitations in the gfortran compiler and/or >>> Open >>> MPI, does not support the following: array >>> subsections, direct passthru (where possible) >>> to >>> underlying Open MPI's C functionality >>> Fort mpi_f08 subarrays: no >>> Java bindings: no >>> Wrapper compiler rpath: unnecessary >>> C compiler: clang >>> C compiler absolute: clang >>> C compiler family name: CLANG >>> C compiler version: 14.0.0 (clang-1400.0.29.202) >>> C++ compiler: clang++ >>> C++ compiler absolute: clang++ >>> Fort compiler: gfortran >>> Fort compiler abs: /opt/homebrew/opt/gcc/bin/gfortran >>> Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::) >>> Fort 08 assumed shape: yes >>> Fort optional args: yes >>> Fort INTERFACE: yes >>> Fort ISO_FORTRAN_ENV: yes >>> Fort STORAGE_SIZE: yes >>> Fort BIND(C) (all): yes >>> Fort ISO_C_BINDING: yes >>> Fort SUBROUTINE BIND(C): yes >>> Fort TYPE,BIND(C): yes >>> Fort T,BIND(C,name="a"): yes >>> Fort PRIVATE: yes >>> Fort ABSTRACT: yes >>> Fort ASYNCHRONOUS: yes >>> Fort PROCEDURE: yes >>> Fort USE...ONLY: yes >>> Fort C_FUNLOC: yes >>> Fort f08 using wrappers: yes >>> Fort MPI_SIZEOF: yes >>> C profiling: yes >>> Fort mpif.h profiling: yes >>> Fort use mpi profiling: yes >>> Fort use mpi_f08 prof: yes >>> Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: >>> yes, >>> OMPI progress: no, Event lib: yes) >>> Sparse Groups: no >>> Internal debug support: no >>> MPI interface warnings: yes >>> MPI parameter check: runtime >>> Memory profiling support: no >>> Memory debugging support: no >>> dl support: yes >>> Heterogeneous support: no >>> MPI_WTIME support: native >>> Symbol vis. support: yes >>> Host topology support: yes >>> IPv6 support: yes >>> MPI extensions: affinity, cuda, ftmpi, rocm, shortfloat >>> Fault Tolerance support: yes >>> FT MPI support: yes >>> MPI_MAX_PROCESSOR_NAME: 256 >>> MPI_MAX_ERROR_STRING: 256 >>> MPI_MAX_OBJECT_NAME: 64 >>> MPI_MAX_INFO_KEY: 36 >>> MPI_MAX_INFO_VAL: 256 >>> MPI_MAX_PORT_NAME: 1024 >>> MPI_MAX_DATAREP_STRING: 128 >>> MCA accelerator: null (MCA v2.1.0, API v1.0.0, Component v5.0.1) >>> MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> MCA btl: self (MCA v2.1.0, API v3.3.0, Component v5.0.1) >>> MCA btl: sm (MCA v2.1.0, API v3.3.0, Component v5.0.1) >>> MCA btl: tcp (MCA v2.1.0, API v3.3.0, Component v5.0.1) >>> MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component >>> v5.0.1) >>> MCA if: bsdx_ipv6 (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v5.0.1) >>> MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> MCA mpool: hugepage (MCA v2.1.0, API v3.1.0, Component >>> v5.0.1) >>> MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component >>> v5.0.1) >>> MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component >>> v5.0.1) >>> MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v5.0.1) >>> MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v5.0.1) >>> MCA threads: pthreads (MCA v2.1.0, API v1.0.0, Component >>> v5.0.1) >>> MCA timer: darwin (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> MCA bml: r2 (MCA v2.1.0, API v2.1.0, Component v5.0.1) >>> MCA coll: adapt (MCA v2.1.0, API v2.4.0, Component >>> v5.0.1) >>> MCA coll: basic (MCA v2.1.0, API v2.4.0, Component >>> v5.0.1) >>> MCA coll: han (MCA v2.1.0, API v2.4.0, Component v5.0.1) >>> MCA coll: inter (MCA v2.1.0, API v2.4.0, Component >>> v5.0.1) >>> MCA coll: libnbc (MCA v2.1.0, API v2.4.0, Component >>> v5.0.1) >>> MCA coll: self (MCA v2.1.0, API v2.4.0, Component v5.0.1) >>> MCA coll: sync (MCA v2.1.0, API v2.4.0, Component v5.0.1) >>> MCA coll: tuned (MCA v2.1.0, API v2.4.0, Component >>> v5.0.1) >>> MCA coll: ftagree (MCA v2.1.0, API v2.4.0, Component >>> v5.0.1) >>> MCA coll: monitoring (MCA v2.1.0, API v2.4.0, Component >>> v5.0.1) >>> MCA coll: sm (MCA v2.1.0, API v2.4.0, Component v5.0.1) >>> MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v5.0.1) >>> MCA hook: comm_method (MCA v2.1.0, API v1.0.0, Component >>> v5.0.1) >>> MCA io: ompio (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> MCA io: romio341 (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v5.0.1) >>> MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component >>> v5.0.1) >>> MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v5.0.1) >>> MCA part: persist (MCA v2.1.0, API v4.0.0, Component >>> v5.0.1) >>> MCA pml: cm (MCA v2.1.0, API v2.1.0, Component v5.0.1) >>> MCA pml: monitoring (MCA v2.1.0, API v2.1.0, Component >>> v5.0.1) >>> MCA pml: ob1 (MCA v2.1.0, API v2.1.0, Component v5.0.1) >>> MCA pml: v (MCA v2.1.0, API v2.1.0, Component v5.0.1) >>> MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v5.0.1) >>> MCA topo: basic (MCA v2.1.0, API v2.2.0, Component >>> v5.0.1) >>> MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component >>> v5.0.1) >>> MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> >>> On Mon, Feb 5, 2024 at 12:48 PM George Bosilca <bosi...@icl.utk.edu> >>> wrote: >>> >>>> OMPI seems unable to create a communication medium between your >>>> processes. There are few known issues on OSX, please read >>>> https://github.com/open-mpi/ompi/issues/12273 for more info. >>>> >>>> Can you provide the header of the ompi_info command. What I'm >>>> interested on is the part about `Configure command line:` >>>> >>>> George. >>>> >>>> >>>> On Mon, Feb 5, 2024 at 12:18 PM John Haiducek via users < >>>> users@lists.open-mpi.org> wrote: >>>> >>>>> I'm having problems running programs compiled against the OpenMPI >>>>> 5.0.1 package provided by homebrew on MacOS (arm) 12.6.1. >>>>> >>>>> When running a Fortran test program that simply calls MPI_init >>>>> followed by MPI_Finalize, I get the following output: >>>>> >>>>> $ mpirun -n 2 ./mpi_init_test >>>>> >>>>> -------------------------------------------------------------------------- >>>>> It looks like MPI_INIT failed for some reason; your parallel process is >>>>> likely to abort. There are many reasons that a parallel process can >>>>> fail during MPI_INIT; some of which are due to configuration or >>>>> environment >>>>> problems. This failure appears to be an internal failure; here's some >>>>> additional information (which may only be relevant to an Open MPI >>>>> developer): >>>>> >>>>> PML add procs failed >>>>> --> Returned "Not found" (-13) instead of "Success" (0) >>>>> >>>>> -------------------------------------------------------------------------- >>>>> >>>>> -------------------------------------------------------------------------- >>>>> It looks like MPI_INIT failed for some reason; your parallel process is >>>>> likely to abort. There are many reasons that a parallel process can >>>>> fail during MPI_INIT; some of which are due to configuration or >>>>> environment >>>>> problems. This failure appears to be an internal failure; here's some >>>>> additional information (which may only be relevant to an Open MPI >>>>> developer): >>>>> >>>>> ompi_mpi_init: ompi_mpi_instance_init failed >>>>> --> Returned "Not found" (-13) instead of "Success" (0) >>>>> >>>>> -------------------------------------------------------------------------- >>>>> [haiducek-lt:00000] *** An error occurred in MPI_Init >>>>> [haiducek-lt:00000] *** reported by process [1905590273,1] >>>>> [haiducek-lt:00000] *** on a NULL communicator >>>>> [haiducek-lt:00000] *** Unknown error >>>>> [haiducek-lt:00000] *** MPI_ERRORS_ARE_FATAL (processes in this >>>>> communicator will now abort, >>>>> [haiducek-lt:00000] *** and MPI will try to terminate your MPI job >>>>> as well) >>>>> >>>>> -------------------------------------------------------------------------- >>>>> prterun detected that one or more processes exited with non-zero >>>>> status, >>>>> thus causing the job to be terminated. The first process to do so was: >>>>> >>>>> Process name: [prterun-haiducek-lt-15584@1,1] Exit code: 14 >>>>> >>>>> -------------------------------------------------------------------------- >>>>> >>>>> I'm not sure whether this is the result of a bug in OpenMPI, in the >>>>> homebrew package, or a misconfiguration of my system. Any suggestions for >>>>> troubleshooting this? >>>>> >>>>