Hello, I've seem to run into an interesting problem with openMPI. After allocating 3 processors and confirming that the 3 processors are allocated. mpirun on a simple mpitest program seems to run on 4 processors. We have 2 processors per node. I can repeat this case with any odd number of nodes, openMPI seems to take any remaining processors on the box. We are running openMPI v1.3.3. Here is an example of what happens:
node64-test ~>salloc -n3 salloc: Granted job allocation 825 node64-test ~>srun hostname node64-28.xxxx.xxxx.xxxx.xxxx node64-28.xxxx.xxxx.xxxx.xxxx node64-29.xxxx.xxxx.xxxx.xxxx node64-test ~>MX_RCACHE=0 LD_LIBRARY_PATH="/hurd/mpi/openmpi/lib:/usr/local/mx/lib" mpirun mpi_pgms/mpitest MPI domain size: 4 I am rank 000 - node64-28.xxxx.xxxx.xxxx.xxxx I am rank 003 - node64-29.xxxx.xxxx.xxxx.xxxx I am rank 001 - node64-28.xxxx.xxxx.xxxx.xxxx I am rank 002 - node64-29.xxxx.xxxx.xxxx.xxxx For those who may be curious here is the program: #include <stdio.h> #include <stdlib.h> #include <mpi.h> extern int main(int argc, char *argv[]); extern int main(int argc, char *argv[]) { auto int rank, size, namelen; MPI_Status status; static char processor_name[MPI_MAX_PROCESSOR_NAME]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); if ( rank == 0 ) { MPI_Get_processor_name(processor_name, &namelen); fprintf(stdout,"My name is: %s\n",processor_name); fprintf(stdout,"Cluster size is: %d\n", size); } else { MPI_Get_processor_name(processor_name, &namelen); fprintf(stdout,"My name is: %s\n",processor_name); } MPI_Finalize(); return(0); } I'm curious if this is a bug in the way openMPI interprets SLURM environment variables. If you have any ideas or need any more information let me know. Thanks. Matt