On Tue, Mar 09, 2010 at 05:43:02PM +0100, Ramon wrote:
> Am I the only one experiencing such problem?  Is there any solution?

No, you are not the only one.  Several others have mentioned the "busy
wait" problem.

The response on the OpenMPI developers, as I understand it, is that
the MPI job should be the only one running, so a 100% busy wait is not
a problem.  I hope the OpenMPI developers will correct me if I have
mis-stated their position.

I posted my cure for the problem some time ago.  I have attached it
again to this message.

Hope that helps,
Douglas.


> Ramon wrote:
>> Hi,
>>
>> I've recently been trying to develop a client-server distributed file  
>> system (for my thesis) using the MPI.  The communication between the  
>> machines is working great, however when ever the MPI_Comm_accept()  
>> function is called, the server starts like consuming 100% of the CPU.
>>
>> One interesting thing is that I tried to compile the same code using  
>> the LAM/MPI library and the mentioned behaviour could not be observed.
>>
>> Is this a bug?
>>
>> On a side note, I'm using Ubuntu 9.10's default OpenMPI deb package.   
>> Its version is 1.3.2.
>>
>> Regards
>>
>> Ramon.
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
  Douglas Guptill                       voice: 902-461-9749
  Research Assistant, LSC 4640          email: douglas.gupt...@dal.ca
  Oceanography Department               fax:   902-494-3877
  Dalhousie University
  Halifax, NS, B3H 4J1, Canada

/*
 * Intercept MPI_Recv, and
 * call PMPI_Irecv, loop over PMPI_Request_get_status and sleep, until done
 *
 * Revision History:
 *  2008-12-17: copied from MPI_Send.c
 *  2008-12-18: tweaking.
 *
 * See MPI_Send.c for additional comments, 
 *  especially w.r.t. PMPI_Request_get_status.
 **/

#include "mpi.h"
#define _POSIX_C_SOURCE 199309 
#include <time.h>

int MPI_Recv(void *buff, int count, MPI_Datatype datatype, 
	      int from, int tag, MPI_Comm comm, MPI_Status *status) {

  int flag, nsec_start=1000, nsec_max=100000;
  struct timespec ts;
  MPI_Request req;

  ts.tv_sec = 0;
  ts.tv_nsec = nsec_start;

  PMPI_Irecv(buff, count, datatype, from, tag, comm, &req);
  do {
    nanosleep(&ts, NULL);
    ts.tv_nsec *= 2;
    ts.tv_nsec = (ts.tv_nsec > nsec_max) ? nsec_max : ts.tv_nsec;
    PMPI_Request_get_status(req, &flag, status);
  } while (!flag);

  return (*status).MPI_ERROR;
}
/*
 * Intercept MPI_Send, and
 * call PMPI_Isend, loop over PMPI_Request_get_status and sleep, until done
 *
 * Revision History:
 *  2008-12-12: skeleton by Jeff Squyres <jsquy...@cisco.com>
 *  2008-12-16->18: adding parameters, variable wait, 
 *     change MPI_Test to MPI_Request_get_status
 *      Douglas Guptill <douglas.gupt...@dal.ca>
 **/

/* When we use this:
 *   PMPI_Test(&req, &flag, &status); 
 * we get:
 * dguptill@DOME:$ mpirun -np 2 mpi_send_recv_test_mine
 * This is process            0  of            2 .
 * This is process            1  of            2 .
 * error: proc            0 ,mpi_send returned -1208109376
 * error: proc            1 ,mpi_send returned -1208310080
 *     1 changed to            3
 *
 * Using MPI_request_get_status cures the problem.
 *
 * A read of mpi21-report.pdf confirms that MPI_Request_get_status
 * is the appropriate choice, since there seems to be something
 * between the call to MPI_SEND (MPI_RECV) in my FORTRAN program
 * and MPI_Send.c (MPI_Recv.c)
 **/


#include "mpi.h"
#define _POSIX_C_SOURCE 199309 
#include <time.h>

int MPI_Send(void *buff, int count, MPI_Datatype datatype, 
	      int dest, int tag, MPI_Comm comm) {

  int flag, nsec_start=1000, nsec_max=100000;
  struct timespec ts;
  MPI_Request req;
  MPI_Status status;

  ts.tv_sec = 0;
  ts.tv_nsec = nsec_start;

  PMPI_Isend(buff, count, datatype, dest, tag, comm, &req);
  do {
    nanosleep(&ts, NULL);
    ts.tv_nsec *= 2;
    ts.tv_nsec = (ts.tv_nsec > nsec_max) ? nsec_max : ts.tv_nsec;
    PMPI_Request_get_status(req, &flag, &status);
  } while (!flag);

  return status.MPI_ERROR;
}

Reply via email to