Guido,

This error message is from MPICH and not Open MPI.

Make sure your environment is correct and the shared filesystem is mounted on 
the compute nodes.


Cheers,

Gilles

Sent from my iPod

> On Dec 12, 2019, at 1:44, Guido granda muñoz via users 
> <users@lists.open-mpi.org> wrote:
> 
> Hi, 
> after following the instructions of the error message, in other works running 
> like this:
> 
> #!/bin/bash
> #PBS -l nodes=1:ppn=32
> #PBS -N mc_cond_0_h3
> #PBS -o mc_cond_0_h3.o
> #PBS -e mc_cond_0_h3.e
> 
> PATH=$HOME/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/bin:$PATH
> LD_LIBRARY_PATH=/share/apps/gcc-7.3.0/lib64:$LD_LIBRARY_PATH
> cd $PBS_O_WORKDIR
> mpirun --mca btl vader,self -np 32 ./flash4 
> 
> I get the following error messages:
> 
> [mpiexec@compute-0-34.local] match_arg (utils/args/args.c:159): unrecognized 
> argument mca
> [mpiexec@compute-0-34.local] HYDU_parse_array (utils/args/args.c:174): 
> argument matching returned error
> [mpiexec@compute-0-34.local] parse_args (ui/mpich/utils.c:1596): error 
> parsing input array
> [mpiexec@compute-0-34.local] HYD_uii_mpx_get_parameters 
> (ui/mpich/utils.c:1648): unable to parse user arguments
> [mpiexec@compute-0-34.local] main (ui/mpich/mpiexec.c:149): error parsing 
> parameters
> 
> Am I running it incorrectly ? 
> Cheers,
> 
> El mar., 10 dic. 2019 a las 15:40, Guido granda muñoz 
> (<guidogra...@gmail.com>) escribió:
>> Hello,
>> I compiled the application now using  openmpi-4.0.2:
>> 
>>  linux-vdso.so.1 =>  (0x00007fffb23ff000)
>> libhdf5.so.103 => 
>> /home/guido/libraries/compiled_with_gcc-7.3.0/hdf5-1.10.5_serial/lib/libhdf5.so.103
>>  (0x00002b3cd188c000)
>> libz.so.1 => /lib64/libz.so.1 (0x00002b3cd1e74000)
>> libmpi_usempif08.so.40 => 
>> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_usempif08.so.40
>>  (0x00002b3cd208a000)
>> libmpi_usempi_ignore_tkr.so.40 => 
>> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_usempi_ignore_tkr.so.40
>>  (0x00002b3cd22c0000)
>> libmpi_mpifh.so.40 => 
>> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_mpifh.so.40
>>  (0x00002b3cd24c7000)
>> libmpi.so.40 => 
>> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi.so.40 
>> (0x00002b3cd2723000)
>> libgfortran.so.4 => /share/apps/gcc-7.3.0/lib64/libgfortran.so.4 
>> (0x00002b3cd2a55000)
>> libm.so.6 => /lib64/libm.so.6 (0x00002b3cd2dc3000)
>> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b3cd3047000)
>> libquadmath.so.0 => /share/apps/gcc-5.4.0/lib64/libquadmath.so.0 
>> (0x00002b3cd325e000)
>> libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b3cd349c000)
>> libc.so.6 => /lib64/libc.so.6 (0x00002b3cd36b9000)
>> librt.so.1 => /lib64/librt.so.1 (0x00002b3cd3a4e000)
>> libdl.so.2 => /lib64/libdl.so.2 (0x00002b3cd3c56000)
>> libopen-rte.so.40 => 
>> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libopen-rte.so.40
>>  (0x00002b3cd3e5b000)
>> libopen-pal.so.40 => 
>> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libopen-pal.so.40
>>  (0x00002b3cd4110000)
>> libudev.so.0 => /lib64/libudev.so.0 (0x00002b3cd4425000)
>> libutil.so.1 => /lib64/libutil.so.1 (0x00002b3cd4634000)
>> /lib64/ld-linux-x86-64.so.2 (0x00002b3cd166a000)
>> 
>> and ran it like this:
>> 
>> #!/bin/bash
>> #PBS -l nodes=1:ppn=32
>> #PBS -N mc_cond_0_h3 
>> #PBS -o mc_cond_0_h3.o
>> #PBS -e mc_cond_0_h3.e
>> 
>> PATH=$HOME/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/bin:$PATH
>> LD_LIBRARY_PATH=/share/apps/gcc-7.3.0/lib64:$LD_LIBRARY_PATH
>> cd $PBS_O_WORKDIR
>> mpirun -np 32 ./flash4 
>> 
>> and now I'm getting this error messages:
>> 
>> --------------------------------------------------------------------------
>> As of version 3.0.0, the "sm" BTL is no longer available in Open MPI.
>> 
>> Efficient, high-speed same-node shared memory communication support in
>> Open MPI is available in the "vader" BTL. To use the vader BTL, you
>> can re-run your job with:
>> 
>> mpirun --mca btl vader,self,... your_mpi_application
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> A requested component was not found, or was unable to be opened. This
>> means that this component is either not installed or is unable to be
>> used on your system (e.g., sometimes this means that shared libraries
>> that the component requires are unable to be found/loaded). Note that
>> Open MPI stopped checking at the first component that it did not find.
>> 
>> Host: compute-0-34.local
>> Framework: btl
>> Component: sm
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>> 
>> mca_bml_base_open() failed
>> --> Returned "Not found" (-13) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> [compute-0-34:16915] *** An error occurred in MPI_Init
>> [compute-0-34:16915] *** reported by process [3776708609,5]
>> [compute-0-34:16915] *** on a NULL communicator
>> [compute-0-34:16915] *** Unknown error
>> [compute-0-34:16915] *** MPI_ERRORS_ARE_FATAL (processes in this 
>> communicator will now abort,
>> [compute-0-34:16915] *** and potentially your MPI job)
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file 
>> server/pmix_server.c at line 2147
>> [compute-0-34.local:16902] 31 more processes have sent help message 
>> help-mpi-btl-sm.txt / btl sm is dead
>> [compute-0-34.local:16902] Set MCA parameter "orte_base_help_aggregate" to 0 
>> to see all help / error messages
>> [compute-0-34.local:16902] 31 more processes have sent help message 
>> help-mca-base.txt / find-available:not-valid
>> [compute-0-34.local:16902] 31 more processes have sent help message 
>> help-mpi-runtime.txt / mpi_init:startup:internal-failure
>> [compute-0-34.local:16902] 31 more processes have sent help message 
>> help-mpi-errors.txt / mpi_errors_are_fatal unknown handle
>> /var/spool/torque/mom_priv/jobs/4110.mouruka.crya.privado.SC: line 11: 
>> /home/guido: is a directory
>> 
>> do you know what could cause this error?
>> 
>> Thank you,
>> 
>> 
>>> El vie., 6 dic. 2019 a las 12:13, Jeff Squyres (jsquyres) 
>>> (<jsquy...@cisco.com>) escribió:
>>> On Dec 6, 2019, at 1:03 PM, Jeff Squyres (jsquyres) via users 
>>> <users@lists.open-mpi.org> wrote:
>>> > 
>>> >> I get the same error when running in a single node. I will try to use 
>>> >> the last version. Is there way to check if different versions of open 
>>> >> mpi were used in different nodes? 
>>> > 
>>> > mpirun -np 2 ompi_info | head
>>> > 
>>> > Or something like that.  With 1.10, I don't know/remember the mpirun CLI 
>>> > option to make one process per node (when ppn>1); you may have to check 
>>> > that.  Or just "mpirun -np 33 ompi_info | head" and examine the output 
>>> > carefully to find the 33rd output and see if it's different.
>>> 
>>> Poor quoting on my part.  The intent was to see just the first few lines 
>>> from running `ompi_info` on each node.
>>> 
>>> So maybe something like:
>>> 
>>> ------
>>> $ cat foo.sh
>>> #!/bin/sh
>>> ompi_info | head
>>> $ mpirun -np 2 foo.sh
>>> ------
>>> 
>>> Or "mprun -np 33 foo.sh", ....etc.
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> 
>> 
>> 
>> -- 
>> Guido
> 
> 
> -- 
> Guido

Reply via email to