Hello,

I've seem to run into an interesting problem with openMPI. After
allocating 3 processors and confirming that the 3 processors are
allocated. mpirun on a simple mpitest program seems to run on 4
processors. We have 2 processors per node. I can repeat this case with any
odd number of nodes, openMPI seems to take any remaining processors on the
box. We are running openMPI v1.3.3. Here is an example of what happens:

node64-test ~>salloc -n3
salloc: Granted job allocation 825

node64-test ~>srun hostname
node64-28.xxxx.xxxx.xxxx.xxxx
node64-28.xxxx.xxxx.xxxx.xxxx
node64-29.xxxx.xxxx.xxxx.xxxx

node64-test ~>MX_RCACHE=0
LD_LIBRARY_PATH="/hurd/mpi/openmpi/lib:/usr/local/mx/lib" mpirun
mpi_pgms/mpitest
MPI domain size: 4
I am rank 000 - node64-28.xxxx.xxxx.xxxx.xxxx
I am rank 003 - node64-29.xxxx.xxxx.xxxx.xxxx
I am rank 001 - node64-28.xxxx.xxxx.xxxx.xxxx
I am rank 002 - node64-29.xxxx.xxxx.xxxx.xxxx



For those who may be curious here is the program:

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

extern int main(int argc, char *argv[]);

extern int main(int argc, char *argv[])

{
        auto int rank,
                 size,
                 namelen;

        MPI_Status status;

        static char processor_name[MPI_MAX_PROCESSOR_NAME];

        MPI_Init(&argc, &argv);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
        MPI_Comm_size(MPI_COMM_WORLD, &size);

       if ( rank == 0 )
        {
                MPI_Get_processor_name(processor_name, &namelen);
                fprintf(stdout,"My name is: %s\n",processor_name);
                fprintf(stdout,"Cluster size is: %d\n", size);

        }
        else
        {
                MPI_Get_processor_name(processor_name, &namelen);
                fprintf(stdout,"My name is: %s\n",processor_name);
        }

        MPI_Finalize();
        return(0);
}


I'm curious if this is a bug in the way openMPI interprets SLURM
environment variables. If you have any ideas or need any more information
let me know.


Thanks.
Matt

Reply via email to