My guess is you must have a mismatched MPI_Bcast somewhere
in the code.  Presumably, there is a call to MPI_Bcast on the head
node that broadcasts something larger than 1 MPI_INT and does not
have the matching call on the worker nodes.  Then, when the MPI_Bcast
on the worker nodes is called, they are matching up with the previous
call on the head node.

Since the head node is the root of the MPI_Bcast, then it can
blast its data to the other nodes and then continue on.  I have
attached a simple program that shows the same behavior.

And yes, the Send and Recv can work independently from the
Broadcast as they are using different tags to match up their data.

Rolf

PS: Simple program at end of message.

Marcin Skoczylas wrote:

Sorry I forgot to mention: Open MPI version 1.2.4


Marcin Skoczylas wrote:
Hello,

After whole day of coding I'm fighting little bit with one small fragment which seems strange for me. For testing I have one head node and two worker nodes on localhost. Having this code (with debug stuff added like sleeps, barriers, etc):

void CImageData::SpreadToNodes()
{
   sleep(5);
   logger->debug("CImageData::SpreadToNodes, w=%d h=%d type=%d",
                   this->width, this->height, this->type);
logger->debug("head barrier");
   MPI_Barrier(MPI_COMM_WORLD);
   sleep(2);
   MPI_Barrier(MPI_COMM_WORLD);
// debug 'sync' test
   logger->debug("head send SYNC str");
   char buf[5];
   buf[0] = 'S'; buf[1] = 'Y'; buf[2] = 'N'; buf[3] = 'C';
   for (int nodeId = 1; nodeId < g_NumProcesses; nodeId++)
{ MPI_Send(buf, 4, MPI_CHAR, nodeId, DEF_MSG_TAG, MPI_COMM_WORLD);
   }
logger->debug("head bcast width: %d", this->width);
   MPI_Bcast(&(this->width), 1, MPI_INT, 0, MPI_COMM_WORLD);
   logger->debug("head bcast height: %d", this->height);
   MPI_Bcast(&(this->height), 1, MPI_INT, 0, MPI_COMM_WORLD);
   logger->debug("head bcast type: %d", this->type);
   MPI_Bcast(&(this->type), 1, MPI_UNSIGNED_CHAR, 0, MPI_COMM_WORLD);
logger->debug("head sleep 10s");
   sleep(10);
logger->debug("finished CImageData::SpreadToNodes"); }

// this function is decleared static:
CImageData *CImageData::ReceiveFromHead()
{
   sleep(5);

   logger->debug("CImageData::ReceiveFromHead");
   MPI_Status status;
   int _width;
   int _height;
   byte _type;
logger->debug("worker barrier");
   MPI_Barrier(MPI_COMM_WORLD);
   sleep(2);
   MPI_Barrier(MPI_COMM_WORLD);

   char buf[5];
MPI_Recv(buf, 4, MPI_CHAR, HEAD_NODE, DEF_MSG_TAG, MPI_COMM_WORLD, &status); logger->debug("worker received sync str: '%c' '%c' '%c' '%c'", buf[0], buf[1], buf[2], buf[3]);

   logger->debug("worker bcast width");
   MPI_Bcast(&(_width), 1, MPI_INT, 0, MPI_COMM_WORLD);
   logger->debug("worker bcast height");
   MPI_Bcast(&(_height), 1, MPI_INT, 0, MPI_COMM_WORLD);
   logger->debug("worker bcast type");
   MPI_Bcast(&(_type), 1, MPI_UNSIGNED_CHAR, 0, MPI_COMM_WORLD);
logger->debug("width=%d height=%d type=%d", _width, _height, _type);

   // TODO: create CImageData object, return...
   return NULL;
}


That part of code gives me an error:
RANK 0 -> PID 17115
RANK 1 -> PID 17116
RANK 2 -> PID 17117

2007-10-02 19:50:37,829 [17115] DEBUG: CImageData::SpreadToNodes, w=768 h=576 type=1
2007-10-02 19:50:37,829 [17117] DEBUG: CImageData::ReceiveFromHead
2007-10-02 19:50:37,829 [17115] DEBUG: head barrier
2007-10-02 19:50:37,829 [17116] DEBUG: CImageData::ReceiveFromHead
2007-10-02 19:50:37,829 [17116] DEBUG: worker barrier
2007-10-02 19:50:37,829 [17117] DEBUG: worker barrier
2007-10-02 19:50:39,836 [17115] DEBUG: head send SYNC str
2007-10-02 19:50:39,836 [17115] DEBUG: head bcast width: 768
2007-10-02 19:50:39,836 [17115] DEBUG: head bcast height: 576
2007-10-02 19:50:39,836 [17115] DEBUG: head bcast type: 1
2007-10-02 19:50:39,836 [17115] DEBUG: head sleep 10s
2007-10-02 19:50:39,836 [17116] DEBUG: worker received sync str: 'S' 'Y' 'N' 'C'
2007-10-02 19:50:39,836 [17116] DEBUG: worker bcast width
[pc801:17116] *** An error occurred in MPI_Bcast
[pc801:17116] *** on communicator MPI_COMM_WORLD
[pc801:17116] *** MPI_ERR_TRUNCATE: message truncated
[pc801:17116] *** MPI_ERRORS_ARE_FATAL (goodbye)
2007-10-02 19:50:39,836 [17117] DEBUG: worker received sync str: 'S' 'Y' 'N' 'C'
2007-10-02 19:50:39,836 [17117] DEBUG: worker bcast width
[pc801:17117] *** An error occurred in MPI_Bcast
[pc801:17117] *** on communicator MPI_COMM_WORLD
[pc801:17117] *** MPI_ERR_TRUNCATE: message truncated
[pc801:17117] *** MPI_ERRORS_ARE_FATAL (goodbye)
mpirun noticed that job rank 0 with PID 17115 on node pc801 exited on signal 15 (Terminated).


Could it be that somewhere before this part the data stream was out of sync? The project is quite large and I have a lot of communication between processes before CImageData::SpreadToNodes() so whole debugging could take hours/days, however it seems that data flow before this particular fragment is ok. How could it be that MPI_Send/Recv gave me good buffer (4 chars - SYNC) and MPI_Bcast of MPI_INT is truncated? I tested the code on Valgrind - it didn't complain and gave me exactly the same result. Can I assume that possibly I have somewhere memory-acces error before this part and I destroyed the MPI structures? How exactly MPI_Bcast is working?

Sorry for disturb, but I'm little bit confused.
Thank you & greetings, Marcin

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
#include <stdio.h>
#include <mpi.h>

int
main(int argc, char **argv)
{
   int        rank;
   int        np;       /* number of processes in job */
   int        foo[2];

   MPI_Init(&argc, &argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   MPI_Comm_size(MPI_COMM_WORLD, &np);

   /*
    * Here is an errant MPI_Bcast.  Since this is the root, it
    * can just send the data, and continue on.  A MPI_Bcast is
    * not necessarily a synchronizing collective.
    */
   if (rank == 0) {
   printf("Rank 0 broadcasting 2 MPI_INTs...\n");
   MPI_Bcast(foo, 2, MPI_INT, 0, MPI_COMM_WORLD);
   }

   MPI_Barrier(MPI_COMM_WORLD);
   if (rank == 0) {
   printf("\nEveryone has now synchronized.\n");
   }

   /*
    * Now, everyone calls a broadcast again.  However, the non-zero
    * ranks are matching up with the previous broadcast.
    */
   printf("rank %d is calling MPI_Bcast...\n", rank);

   MPI_Bcast(foo, 1, MPI_INT, 0, MPI_COMM_WORLD);

   MPI_Finalize();
   return 0;
}

Reply via email to