Hmm, strange. It doesn't hang for me and AFAICS it shouldn't hang at
all. I'm using 1.2.5. Which version of Open MPI are you using? 

Hanging with 100% CPU utilization often means that your processes are
caught in a busy wait. You could try to set mpi_yield_when_idle:

> gentryx@hex ~ $ cat .openmpi/mca-params.conf
> mpi_yield_when_idle=1

But I don't think this should be necessary.


On 21:35 Mon 17 Mar     , Giovani Faccin wrote:
> Hi there!
> I'm learning MPI,  and got really puzzled... Please take a look at this very 
> short code:
> #include <iostream>
> #include "mpicxx.h"
> using namespace std;
> int main(int argc, char *argv[])
> {
>     MPI::Init();        
>     for (unsigned long t = 0; t < 10000000; t++)
>     {
>         //If we are process 0:
>         if ( MPI::COMM_WORLD.Get_rank() == 0 )
>         {
>             MPI::Status mpi_status;
>             unsigned long d = 0;
>             unsigned long d2 = 0;
> MPI::ANY_TAG, mpi_status );
>             MPI::COMM_WORLD.Recv(&d2, 1, MPI::UNSIGNED_LONG, MPI::ANY_SOURCE, 
> MPI::ANY_TAG, mpi_status );
>             cout << "Time = " << t << "; Node 0 received: " << d << " and " 
> << d2 << endl;
>         }
>         //Else:
>         else
>         {
>             unsigned long  d = MPI::COMM_WORLD.Get_rank();
>             MPI::COMM_WORLD.Send( &d, 1, MPI::UNSIGNED_LONG, 0, 0);
>         };
>     };
>     MPI::Finalize();
> }
> Ok, so what I'm trying to do is to make a gather operation using point to 
> point communication. In my real application instead of sending an unsigned 
> long I'd be calling an object's send and receive methods, which in turn would 
> call their inner object's similar methods and so on until all data is 
> syncronized. I'm using this loop because the number of objects to be sent to 
> process rank 0 varies depending on the sender.
> When running this test with 3 processes on a dual core, oversubscribed node, 
> I get this output:
> (skipped previous output)
> Time = 5873; Node 0 received: 1 and 2
> Time = 5874; Node 0 received: 1 and 2
> Time = 5875; Node 0 received: 1 and 2
> Time = 5876; Node 0 received: 1 and 2
> and then the application hangs, with processor usage at 100%. The exact time 
> when this condition occurs varies on each run, but it usually happens quite 
> fast.
> What would I have to modify, in this simple example, so that the application 
> works as expected? Must I always use Gather, instead of point to point, to 
> make a syncronization like this?
> Thank you very much!
> Giovani
Andreas Schäfer
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany
