Hello Bennet,

What you are trying to do using srun as the job launcher should work.
Could you post the contents
of /etc/slurm/slurm.conf for your system?

Could you also post the output of the following command:

ompi_info --all | grep pmix

to the mail list.

the config.log from your build would also be useful.

Howard

2017-11-16 9:30 GMT-07:00 r...@open-mpi.org <r...@open-mpi.org>:

> What Charles said was true but not quite complete. We still support the
> older PMI libraries but you likely have to point us to wherever slurm put
> them.
>
> However,we definitely recommend using PMIx as you will get a faster launch
>
> Sent from my iPad
>
> > On Nov 16, 2017, at 9:11 AM, Bennet Fauber <ben...@umich.edu> wrote:
> >
> > Charlie,
> >
> > Thanks a ton!  Yes, we are missing two of the three steps.
> >
> > Will report back after we get pmix installed and after we rebuild
> > Slurm.  We do have a new enough version of it, at least, so we might
> > have missed the target, but we did at least hit the barn.  ;-)
> >
> >
> >
> >> On Thu, Nov 16, 2017 at 10:54 AM, Charles A Taylor <chas...@ufl.edu>
> wrote:
> >> Hi Bennet,
> >>
> >> Three things...
> >>
> >> 1. OpenMPI 2.x requires PMIx in lieu of pmi1/pmi2.
> >>
> >> 2. You will need slurm 16.05 or greater built with —with-pmix
> >>
> >> 2a. You will need pmix 1.1.5 which you can get from github.
> >> (https://github.com/pmix/tarballs).
> >>
> >> 3. then, to launch your mpi tasks on the allocated resources,
> >>
> >>   srun —mpi=pmix ./hello-mpi
> >>
> >> I’m replying to the list because,
> >>
> >> a) this information is harder to find than you might think.
> >> b) someone/anyone can correct me if I’’m giving a bum steer.
> >>
> >> Hope this helps,
> >>
> >> Charlie Taylor
> >> University of Florida
> >>
> >> On Nov 16, 2017, at 10:34 AM, Bennet Fauber <ben...@umich.edu> wrote:
> >>
> >> I think that OpenMPI is supposed to support SLURM integration such that
> >>
> >>   srun ./hello-mpi
> >>
> >> should work?  I built OMPI 2.1.2 with
> >>
> >> export CONFIGURE_FLAGS='--disable-dlopen --enable-shared'
> >> export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran'
> >>
> >> CMD="./configure \
> >>   --prefix=${PREFIX} \
> >>   --mandir=${PREFIX}/share/man \
> >>   --with-slurm \
> >>   --with-pmi \
> >>   --with-lustre \
> >>   --with-verbs \
> >>   $CONFIGURE_FLAGS \
> >>   $COMPILERS
> >>
> >> I have a simple hello-mpi.c (source included below), which compiles
> >> and runs with mpirun, both on the login node and in a job.  However,
> >> when I try to use srun in place of mpirun, I get instead a hung job,
> >> which upon cancellation produces this output.
> >>
> >> [bn2.stage.arc-ts.umich.edu:116377] PMI_Init [pmix_s1.c:162:s1_init]:
> >> PMI is not initialized
> >> [bn1.stage.arc-ts.umich.edu:36866] PMI_Init [pmix_s1.c:162:s1_init]:
> >> PMI is not initialized
> >> [warn] opal_libevent2022_event_active: event has no event_base set.
> >> [warn] opal_libevent2022_event_active: event has no event_base set.
> >> slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT
> 2017-11-16T10:03:24 ***
> >> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
> >> slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24
> ***
> >>
> >> The SLURM web page suggests that OMPI 2.x and later support PMIx, and
> >> to use `srun --mpi=pimx`, however that no longer seems to be an
> >> option, and using the `openmpi` type isn't working (neither is pmi2).
> >>
> >> [bennet@beta-build hello]$ srun --mpi=list
> >> srun: MPI types are...
> >> srun: mpi/pmi2
> >> srun: mpi/lam
> >> srun: mpi/openmpi
> >> srun: mpi/mpich1_shmem
> >> srun: mpi/none
> >> srun: mpi/mvapich
> >> srun: mpi/mpich1_p4
> >> srun: mpi/mpichgm
> >> srun: mpi/mpichmx
> >>
> >> To get the Intel PMI to work with srun, I have to set
> >>
> >>   I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so
> >>
> >> Is there a comparable environment variable that must be set to enable
> >> `srun` to work?
> >>
> >> Am I missing a build option or misspecifying one?
> >>
> >> -- bennet
> >>
> >>
> >> Source of hello-mpi.c
> >> ==========================================
> >> #include <stdio.h>
> >> #include <stdlib.h>
> >> #include "mpi.h"
> >>
> >> int main(int argc, char **argv){
> >>
> >> int rank;          /* rank of process */
> >> int numprocs;      /* size of COMM_WORLD */
> >> int namelen;
> >> int tag=10;        /* expected tag */
> >> int message;       /* Recv'd message */
> >> char processor_name[MPI_MAX_PROCESSOR_NAME];
> >> MPI_Status status; /* status of recv */
> >>
> >> /* call Init, size, and rank */
> >> MPI_Init(&argc, &argv);
> >> MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
> >> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> >> MPI_Get_processor_name(processor_name, &namelen);
> >>
> >> printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
> >>
> >> if(rank != 0){
> >>   MPI_Recv(&message,    /*buffer for message */
> >>                   1,    /*MAX count to recv */
> >>             MPI_INT,    /*type to recv */
> >>                   0,    /*recv from 0 only */
> >>                 tag,    /*tag of messgae */
> >>      MPI_COMM_WORLD,    /*communicator to use */
> >>             &status);   /*status object */
> >>   printf("Hello from process %d!\n",rank);
> >> }
> >> else{
> >>   /* rank 0 ONLY executes this */
> >>   printf("MPI_COMM_WORLD is %d processes big!\n", numprocs);
> >>   int x;
> >>   for(x=1; x<numprocs; x++){
> >>      MPI_Send(&x,          /*send x to process x */
> >>                1,          /*number to send */
> >>          MPI_INT,          /*type to send */
> >>                x,          /*rank to send to */
> >>              tag,          /*tag for message */
> >>    MPI_COMM_WORLD);        /*communicator to use */
> >>   }
> >> } /* end else */
> >>
> >>
> >> /* always call at end */
> >> MPI_Finalize();
> >>
> >> return 0;
> >> }
> >> _______________________________________________
> >> users mailing list
> >> users@lists.open-mpi.org
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.
> open-2Dmpi.org_mailman_listinfo_users&d=DwICAg&c=
> pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=HOtXciFqK5GlgIgLAxthUQ&m=
> t2C9i2WW8vYudLmnfvtKjpqTlBguLeivBwHAaQ1TcM4&s=aakHf5ypdTOe4-
> hQ86pcEN9FmiW1Xyngln5ODOUwCqQ&e=
> >>
> >>
> >>
> >> _______________________________________________
> >> users mailing list
> >> users@lists.open-mpi.org
> >> https://lists.open-mpi.org/mailman/listinfo/users
> > _______________________________________________
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to