Hi All,
 
I'm having this weird problem when running a very simple OpenMPI application. 
The application sends an integer from the rank 0 process to the rank 1 process. 
The sequence of code that I use to accomplish this is the following:
        if (rank == 0)
        {
                printf("Process %d - Sending...\n", rank);
                MPI_Send(&sent, 1, MPI_INT, 1, 1, MPI_COMM_WORLD);
                printf("Process %d - Sent.\n", rank);
        }
        if (rank == 1)
        {
                 printf("Process %d - Receiving...\n", rank);
                MPI_Recv(&received, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, &stat);
                printf("Process %d - Received.\n", rank");
        }
 
        printf("Process %d - Barrier reached.\n", rank);
        MPI_Barrier(MPI_COMM_WORLD);
        printf("Process %d - Barrier passed.\n", rank");
 
Like I said, a very simple program.
When launching this application with SLURM (using "salloc -N2 mpirun 
./<my_app>"), it hangs at the barrier. However, it passes the barrier if I 
launch it without SLURM (using "mpirun -np 2 ./<my_app>"). I first noticed this 
problem when my application hanged if I tried to send two successive messages 
from a process to another. Only the first MPI_Send would work. The second 
MPI_Send would block indefinitely. I was wondering whether any of you have 
encountered a similar problem, or may have an ideea as to what is causing the 
Send/Receive pair to block when using SLURM. The exact output in my console is 
as follows:
 
        salloc: Granted job allocation 1138
        Process 0 - Sending...
        Process 1 - Receiving...
        Process 1 - Received.
        Process 1 - Barrier reached.
        Process 0 - Sent.
        Process 0 - Barrier reached.
        (it just hangs here)
 
I am new to MPI programming and to OpenMPI and would greatly appreciate any 
help. My OpenMPI version is 1.4.4 (although I have also tried it on 1.5.4), my 
SLURM version is 0.3.3-1 (slurm-llnl 2.1.0-1), the operating system on the 
cluster on which I tried to run my application is Ubuntu 10.04 LTS Server x64. 
If anyone is willing to help me out, I will happily provide any other info 
requested (as long as the request comes with instructions on how to get that 
info).
 
Your answers will be of great help! Thanks!
 
Adrian

Reply via email to