Hello,
I installed openmpi from sources and all hte libraries and proper
include files where installed correctly in /opt/openmpi/4.0.0
as I prefer it in a directory that can I export via NFS rather than the
default /usr/local
Anyway slurm's configure still complains and it is not happy
./configure --with-pmix=/opt/openmpi/4.0.0/
configure:20846: checking for pmix installation
configure:20881: gcc -o conftest -g -O2 -pthread
-I/opt/openmpi/4.0.0//include conftest.c -L/opt/openmpi/4.0.0//lib
-lpmix >&5
/opt/openmpi/4.0.0//lib/libpmix.so: undefined reference to
`opal_libevent2022_evthread_use_pthreads'
/opt/openmpi/4.0.0//lib/libpmix.so: undefined reference to
`opal_libevent2022_event_base_loop'
/opt/openmpi/4.0.0//lib/libpmix.so: undefined reference to
`opal_libevent2022_event_add'
/opt/openmpi/4.0.0//lib/libpmix.so: undefined reference to
`opal_libevent2022_event_base_free'
/opt/openmpi/4.0.0//lib/libpmix.so: undefined reference to
`opal_libevent2022_event_active'
/opt/openmpi/4.0.0//lib/libpmix.so: undefined reference to
`opal_libevent2022_event_base_loopbreak'
/opt/openmpi/4.0.0//lib/libpmix.so: undefined reference to
`opal_libevent2022_event_del'
/opt/openmpi/4.0.0//lib/libpmix.so: undefined reference to
`opal_libevent2022_event_assign'
/opt/openmpi/4.0.0//lib/libpmix.so: undefined reference to
`opal_libevent2022_event_base_new'
collect2: error: ld returned 1 exit status
configure:20881: $? = 1
I did set LD_LIBRARY_PATH
echo $LD_LIBRARY_PATH
/opt/openmpi/4.0.0/lib
any hints ?
thank you very much
Rick
On 3/12/19 5:19 PM, Gilles Gouaillardet wrote:
Rick,
The issue is SLURM can only provide pmi2 support, and it seems Open
MPI only supports pmix
One option is to rebuild SLURM with PMIx as explained by Daniel, and then
srun --mpi=pmix ...
If you do not want (or cannot) rebuilt SLURM, you can use the older
pmi or pmi2.
In that case, you have to rebuild Open MPI and pass --with-pmi to the
configure command line
and then
srun --mpi=pmi2 ...
(or srun --mpi=pmi ...)
Finally, you can
scontrol show config | grep MpiDefault
and have your sysadmin update this so a simple
srun ....
will run without any --mpi=... parameter
Cheers,
Gilles
On 3/13/2019 5:53 AM, Riccardo Veraldi wrote:
Hello,
after trynig hard for over 10 days I am forced to write to the list.
I am not able to have SLURM work with openmpi. Openmpi compiled
binaries won't run on slurm, while all non openmpi progs run just
fine under "srun". I am using SLURM 18.08.5 building the rpm from the
tarball: rpmbuild -ta slurm-18.08.5-2.tar.bz2
prior to bulid SLURM I installed openmpi 4.0.0 which has built in
pmix support. the pmix libraries are in /usr/lib64/pmix/ which is the
default installation path.
The problem is that hellompi is not working if I launch in from srun.
of course it runs outside slurm.
[psanagpu105:10995] OPAL ERROR: Not initialized in file
pmix3x_client.c at line 113
--------------------------------------------------------------------------
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:
version 16.05 or later: you can use SLURM's PMIx support. This
requires that you configure and build SLURM --with-pmix.
Versions earlier than 16.05: you must use either SLURM's PMI-1 or
PMI-2 support. SLURM builds PMI-1 by default, or you can manually
install PMI-2. You must then build Open MPI using --with-pmi pointing
to the SLURM PMI library location.
Please configure as appropriate and try again.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[psanagpu105:10995] Local abort before MPI_INIT completed completed
successfully, but am not able to aggregate error messages, and not
able to guarantee that all other processes were killed!
srun: error: psanagpu105: task 0: Exited with exit code 1
I really have no clue. I even reinstalled openmpi on a specific
different path /opt/openmpi/4.0.0
anyway seems like slurm does not know how to fine the MPI libraries
even though they are there and right now in the default path /usr/lib64
even using --mpi=pmi2 or --mpi=openmpi does not fix the problem and
the same error message is given to me.
srun --mpi=list
srun: MPI types are...
srun: none
srun: openmpi
srun: pmi2
Any hint how could I fix this problem ?
thanks a lot
Rick