Actually, I think the problem might be a little more subtle.

I see that you configured with both --enable-static and --enable-shared.

My gut reaction is that there might be some kind of issue with enabling both of 
those options (by default, shared is enabled and static is disabled).  If you 
configure+build with just one of those two options, does it work?

--
Jeff Squyres
jsquy...@cisco.com
________________________________
From: users <users-boun...@lists.open-mpi.org> on behalf of Pritchard Jr., 
Howard via users <users@lists.open-mpi.org>
Sent: Wednesday, October 5, 2022 11:47 AM
To: Jeffrey D. (JD) Tamucci <jeffrey.tamu...@uconn.edu>
Cc: Pritchard Jr., Howard <howa...@lanl.gov>; Open MPI Users 
<users@lists.open-mpi.org>
Subject: Re: [OMPI users] [EXTERNAL] Beginner Troubleshooting OpenMPI 
Installation - pmi.h Error


Hi Jeff,



I think you are now in the “send the system admin an email to install RPMs, in 
particular ask that the numa and udev devel rpms be installed”.  They will need 
to install these rpms on the compute node image(s) as well.



Howard





From: "Jeffrey D. (JD) Tamucci" <jeffrey.tamu...@uconn.edu>
Date: Wednesday, October 5, 2022 at 9:20 AM
To: "Pritchard Jr., Howard" <howa...@lanl.gov>
Cc: "bbarr...@amazon.com" <bbarr...@amazon.com>, Open MPI Users 
<users@lists.open-mpi.org>
Subject: Re: [EXTERNAL] [OMPI users] Beginner Troubleshooting OpenMPI 
Installation - pmi.h Error



Gladly, I tried it that way and it worked in that it was able to find pmi.h. 
Unfortunately there's a new error about finding lnuma and ludev.



make[2]: Entering directory '/shared/maylab/src/openmpi-4.1.4/opal'
  CCLD     
libopen-pal.la<https://urldefense.com/v3/__http:/libopen-pal.la__;!!Bt8fGhp8LhKGRg!BWR7snajnpicZF4YgkUocF-Zm3n1tT0PSpwsOGfvHrB1qcFmYIq9xU56yhcjTEBv6oq1Z5meQDixEwQJWs4fc6wp3HEFfA$>
/usr/bin/ld: cannot find -lnuma
/usr/bin/ld: cannot find -ludev
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:2249: 
libopen-pal.la<https://urldefense.com/v3/__http:/libopen-pal.la__;!!Bt8fGhp8LhKGRg!BWR7snajnpicZF4YgkUocF-Zm3n1tT0PSpwsOGfvHrB1qcFmYIq9xU56yhcjTEBv6oq1Z5meQDixEwQJWs4fc6wp3HEFfA$>]
 Error 1
make[2]: Leaving directory '/shared/maylab/src/openmpi-4.1.4/opal'
make[1]: *** [Makefile:2394: install-recursive] Error 1
make[1]: Leaving directory '/shared/maylab/src/openmpi-4.1.4/opal'
make: *** [Makefile:1912: install-recursive] Error 1



Here is a dropbox link to the full output: 
https://www.dropbox.com/s/4rv8n2yp320ix08/ompi-output_Oct4_2022.tar.bz2?dl=0<https://urldefense.com/v3/__https:/www.dropbox.com/s/4rv8n2yp320ix08/ompi-output_Oct4_2022.tar.bz2?dl=0__;!!Bt8fGhp8LhKGRg!BWR7snajnpicZF4YgkUocF-Zm3n1tT0PSpwsOGfvHrB1qcFmYIq9xU56yhcjTEBv6oq1Z5meQDixEwQJWs4fc6y8gBZt9g$>



Thank you for your help!



JD





Jeffrey D. (JD) Tamucci

University of Connecticut

Molecular & Cell Biology

RA in Lab of Eric R. May

PhD / MPH Candidate

he/him





On Tue, Oct 4, 2022 at 1:51 PM Pritchard Jr., Howard 
<howa...@lanl.gov<mailto:howa...@lanl.gov>> wrote:

*Message sent from a system outside of UConn.*



Could you change the –with-pmi to be

--with-pmi=/cm/shared/apps/slurm21.08.8



?





From: "Jeffrey D. (JD) Tamucci" 
<jeffrey.tamu...@uconn.edu<mailto:jeffrey.tamu...@uconn.edu>>
Date: Tuesday, October 4, 2022 at 10:40 AM
To: "Pritchard Jr., Howard" <howa...@lanl.gov<mailto:howa...@lanl.gov>>, 
"bbarr...@amazon.com<mailto:bbarr...@amazon.com>" 
<bbarr...@amazon.com<mailto:bbarr...@amazon.com>>
Cc: Open MPI Users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>
Subject: Re: [EXTERNAL] [OMPI users] Beginner Troubleshooting OpenMPI 
Installation - pmi.h Error



Hi Howard and Brian,



Of course. Here's a dropbox link to the full folder: 
https://www.dropbox.com/s/raqlcnpgk9wz78b/ompi-output_Sep30_2022.tar.bz2?dl=0<https://urldefense.com/v3/__https:/www.dropbox.com/s/raqlcnpgk9wz78b/ompi-output_Sep30_2022.tar.bz2?dl=0__;!!Bt8fGhp8LhKGRg!Gbf2ik51d_yyLNSd0MxiRpzUUleMIUbnc_K_GZiX3bNyn_5hxYeebIpaGygYEZebCOMxxbVZugqOTreswGqTKVLD8RFMow$>



This was the configure and make commands:

./configure \
        --prefix=/shared/maylab/mayapps/mpi/openmpi/4.1.4 \
        --with-slurm \
        --with-lsf=no \
        --with-pmi=/cm/shared/apps/slurm/21.08.8/include/slurm \
        --with-pmi-libdir=/cm/shared/apps/slurm/21.08.8/lib64 \
        --with-hwloc=/cm/shared/apps/hwloc/1.11.11 \
--with-cuda=/gpfs/sharedfs1/admin/hpc2.0/apps/cuda/11.6 \
        --enable-shared \
        --enable-static &&
make -j 32 &&
make -j 32 check
make install

The output of the make command is in the install_open-mpi_4.1.4_hpc2.log file.





Jeffrey D. (JD) Tamucci

University of Connecticut

Molecular & Cell Biology

RA in Lab of Eric R. May

PhD / MPH Candidate

he/him





On Tue, Oct 4, 2022 at 12:33 PM Pritchard Jr., Howard 
<howa...@lanl.gov<mailto:howa...@lanl.gov>> wrote:

*Message sent from a system outside of UConn.*



HI JD,



Could you post the configure options your script uses to build Open MPI?



Howard



From: users 
<users-boun...@lists.open-mpi.org<mailto:users-boun...@lists.open-mpi.org>> on 
behalf of "Jeffrey D. (JD) Tamucci via users" 
<users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>
Reply-To: Open MPI Users 
<users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>
Date: Tuesday, October 4, 2022 at 10:07 AM
To: "users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>" 
<users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>
Cc: "Jeffrey D. (JD) Tamucci" 
<jeffrey.tamu...@uconn.edu<mailto:jeffrey.tamu...@uconn.edu>>
Subject: [EXTERNAL] [OMPI users] Beginner Troubleshooting OpenMPI Installation 
- pmi.h Error



Hi,



I have been trying to install OpenMPI v4.1.4 on a university HPC cluster. We 
use the Bright cluster manager and have SLURM v21.08.8 and RHEL 8.6. I used a 
script to install OpenMPI that a former co-worker had used to successfully 
install OpenMPI v3.0.0 previously. I updated it to include new versions of the 
dependencies and new paths to those installs.



Each time, it fails in the make install step. There is a fatal error about 
finding pmi.h. It specifically says:



make[2]: Entering directory '/shared/maylab/src/openmpi-4.1.4/opal/mca/pmix/s1'
  CC       libmca_pmix_s1_la-pmix_s1_component.lo
  CC       libmca_pmix_s1_la-pmix_s1.lo
pmix_s1.c:29:10: fatal error: pmi.h: No such file or directory
   29 | #include <pmi.h>



I've looked through the archives and seen others face similar errors in years 
past but I couldn't understand the solutions. One person suggested that SLURM 
may be missing PMI libraries. I think I've verified that SLURM has PMI. I 
include paths to those files and it seems to find them earlier in the process.



I'm not sure what the next step is in troubleshooting this. I have included a 
bz2 file containing my install script, a log file containing the script output 
(from build, make, make install), the config.log, and the opal_config.h file. 
If anyone could provide any guidance, I'd  sincerely appreciate it.



Best,

JD

Reply via email to