Hello Bennet, What you are trying to do using srun as the job launcher should work. Could you post the contents of /etc/slurm/slurm.conf for your system?
Could you also post the output of the following command: ompi_info --all | grep pmix to the mail list. the config.log from your build would also be useful. Howard 2017-11-16 9:30 GMT-07:00 r...@open-mpi.org <r...@open-mpi.org>: > What Charles said was true but not quite complete. We still support the > older PMI libraries but you likely have to point us to wherever slurm put > them. > > However,we definitely recommend using PMIx as you will get a faster launch > > Sent from my iPad > > > On Nov 16, 2017, at 9:11 AM, Bennet Fauber <ben...@umich.edu> wrote: > > > > Charlie, > > > > Thanks a ton! Yes, we are missing two of the three steps. > > > > Will report back after we get pmix installed and after we rebuild > > Slurm. We do have a new enough version of it, at least, so we might > > have missed the target, but we did at least hit the barn. ;-) > > > > > > > >> On Thu, Nov 16, 2017 at 10:54 AM, Charles A Taylor <chas...@ufl.edu> > wrote: > >> Hi Bennet, > >> > >> Three things... > >> > >> 1. OpenMPI 2.x requires PMIx in lieu of pmi1/pmi2. > >> > >> 2. You will need slurm 16.05 or greater built with —with-pmix > >> > >> 2a. You will need pmix 1.1.5 which you can get from github. > >> (https://github.com/pmix/tarballs). > >> > >> 3. then, to launch your mpi tasks on the allocated resources, > >> > >> srun —mpi=pmix ./hello-mpi > >> > >> I’m replying to the list because, > >> > >> a) this information is harder to find than you might think. > >> b) someone/anyone can correct me if I’’m giving a bum steer. > >> > >> Hope this helps, > >> > >> Charlie Taylor > >> University of Florida > >> > >> On Nov 16, 2017, at 10:34 AM, Bennet Fauber <ben...@umich.edu> wrote: > >> > >> I think that OpenMPI is supposed to support SLURM integration such that > >> > >> srun ./hello-mpi > >> > >> should work? I built OMPI 2.1.2 with > >> > >> export CONFIGURE_FLAGS='--disable-dlopen --enable-shared' > >> export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran' > >> > >> CMD="./configure \ > >> --prefix=${PREFIX} \ > >> --mandir=${PREFIX}/share/man \ > >> --with-slurm \ > >> --with-pmi \ > >> --with-lustre \ > >> --with-verbs \ > >> $CONFIGURE_FLAGS \ > >> $COMPILERS > >> > >> I have a simple hello-mpi.c (source included below), which compiles > >> and runs with mpirun, both on the login node and in a job. However, > >> when I try to use srun in place of mpirun, I get instead a hung job, > >> which upon cancellation produces this output. > >> > >> [bn2.stage.arc-ts.umich.edu:116377] PMI_Init [pmix_s1.c:162:s1_init]: > >> PMI is not initialized > >> [bn1.stage.arc-ts.umich.edu:36866] PMI_Init [pmix_s1.c:162:s1_init]: > >> PMI is not initialized > >> [warn] opal_libevent2022_event_active: event has no event_base set. > >> [warn] opal_libevent2022_event_active: event has no event_base set. > >> slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT > 2017-11-16T10:03:24 *** > >> srun: Job step aborted: Waiting up to 32 seconds for job step to finish. > >> slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24 > *** > >> > >> The SLURM web page suggests that OMPI 2.x and later support PMIx, and > >> to use `srun --mpi=pimx`, however that no longer seems to be an > >> option, and using the `openmpi` type isn't working (neither is pmi2). > >> > >> [bennet@beta-build hello]$ srun --mpi=list > >> srun: MPI types are... > >> srun: mpi/pmi2 > >> srun: mpi/lam > >> srun: mpi/openmpi > >> srun: mpi/mpich1_shmem > >> srun: mpi/none > >> srun: mpi/mvapich > >> srun: mpi/mpich1_p4 > >> srun: mpi/mpichgm > >> srun: mpi/mpichmx > >> > >> To get the Intel PMI to work with srun, I have to set > >> > >> I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so > >> > >> Is there a comparable environment variable that must be set to enable > >> `srun` to work? > >> > >> Am I missing a build option or misspecifying one? > >> > >> -- bennet > >> > >> > >> Source of hello-mpi.c > >> ========================================== > >> #include <stdio.h> > >> #include <stdlib.h> > >> #include "mpi.h" > >> > >> int main(int argc, char **argv){ > >> > >> int rank; /* rank of process */ > >> int numprocs; /* size of COMM_WORLD */ > >> int namelen; > >> int tag=10; /* expected tag */ > >> int message; /* Recv'd message */ > >> char processor_name[MPI_MAX_PROCESSOR_NAME]; > >> MPI_Status status; /* status of recv */ > >> > >> /* call Init, size, and rank */ > >> MPI_Init(&argc, &argv); > >> MPI_Comm_size(MPI_COMM_WORLD, &numprocs); > >> MPI_Comm_rank(MPI_COMM_WORLD, &rank); > >> MPI_Get_processor_name(processor_name, &namelen); > >> > >> printf("Process %d on %s out of %d\n", rank, processor_name, numprocs); > >> > >> if(rank != 0){ > >> MPI_Recv(&message, /*buffer for message */ > >> 1, /*MAX count to recv */ > >> MPI_INT, /*type to recv */ > >> 0, /*recv from 0 only */ > >> tag, /*tag of messgae */ > >> MPI_COMM_WORLD, /*communicator to use */ > >> &status); /*status object */ > >> printf("Hello from process %d!\n",rank); > >> } > >> else{ > >> /* rank 0 ONLY executes this */ > >> printf("MPI_COMM_WORLD is %d processes big!\n", numprocs); > >> int x; > >> for(x=1; x<numprocs; x++){ > >> MPI_Send(&x, /*send x to process x */ > >> 1, /*number to send */ > >> MPI_INT, /*type to send */ > >> x, /*rank to send to */ > >> tag, /*tag for message */ > >> MPI_COMM_WORLD); /*communicator to use */ > >> } > >> } /* end else */ > >> > >> > >> /* always call at end */ > >> MPI_Finalize(); > >> > >> return 0; > >> } > >> _______________________________________________ > >> users mailing list > >> users@lists.open-mpi.org > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists. > open-2Dmpi.org_mailman_listinfo_users&d=DwICAg&c= > pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=HOtXciFqK5GlgIgLAxthUQ&m= > t2C9i2WW8vYudLmnfvtKjpqTlBguLeivBwHAaQ1TcM4&s=aakHf5ypdTOe4- > hQ86pcEN9FmiW1Xyngln5ODOUwCqQ&e= > >> > >> > >> > >> _______________________________________________ > >> users mailing list > >> users@lists.open-mpi.org > >> https://lists.open-mpi.org/mailman/listinfo/users > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users