Debugging is not a straightforward task. Even posting the code doesn't necessarily help (since no one may be motivated to help or they can't reproduce the problem or...). You'll just have to try different things and see what works for you. Another option is to trace the MPI calls. If a process sends a message, dump out the MPI_Send() arguments. When a receiver receives, correspondingly dump those arguments. Etc. This might be a way of seeing what the program is doing in terms of MPI and thereby getting to suggestion B below.

How do you trace and sort through the resulting data? That's another tough question. Among other things, if you can't find a tool that fits your needs, you can use the PMPI layer to write wrappers. Writing wrappers is like inserting printf() statements, but doesn't quite have the same amount of moral shame associated with it!

Prentice Bisbal wrote:

Choose one

A) Post only the relevant sections of the code. If you have syntax
error, it should be in the Send and Receive calls, or one of the lines
where the data is copied or read from the array/buffer/whatever that
you're sending or receiving.

B) Try reproducing your problem in a toy program that has only enough
code to reproduce your problem. For example, create an array, populate
it with data, send it, and then on the receiving end, receive it, and
print it out. Something simple like that. I find when I do that, I
usually find the error in my code.

Jack Bryan wrote:
But, my code is too long to be posted. dozens of files, thousands of lines. Do you have better ideas ? Any help is appreciated.
Nov. 5 2010
------------------------------------------------------------------------
From: solarbik...@gmail.com
Date: Fri, 5 Nov 2010 11:20:57 -0700
To: us...@open-mpi.org
Subject: Re: [OMPI users] Open MPI data transfer error

As Prentice said, we can't help you without seeing your code.  openMPI
has stood many trials from many programmers, with many bugs ironed out.
So typically it is unlikely openMPI is the source of your error. Without seeing your code the only logical conclusion is that something
is wrong with your programming.

On Fri, Nov 5, 2010 at 10:52 AM, Prentice Bisbal <prent...@ias.edu
<mailto:prent...@ias.edu>> wrote:

   We can't help you with your coding problem without seeing your code.


   Jack Bryan wrote:
   > Thanks,
   > I have used "cout" in c++ to print the values of data.
   >
   > The sender sends correct data to correct receiver.
   >
   > But, receiver gets wrong data from correct sender.
   >
   > why ?
   >
   > thanks
   >
   > Nov. 5 2010
   >
   >> Date: Fri, 5 Nov 2010 08:54:22 -0400
   >> From: prent...@ias.edu <mailto:prent...@ias.edu>
   >> To: us...@open-mpi.org <mailto:us...@open-mpi.org>
   >> Subject: Re: [OMPI users] Open MPI data transfer error
   >>
   >> Jack Bryan wrote:
   >> >
   >> > Hi,
   >> >
   >> > In my Open MPI program, one master sends data to 3 workers.
   >> >
   >> > Two workers can receive their data.
   >> >
   >> > But, the third worker can not get their data.
   >> >
   >> > Before sending data, the master sends a head information to
   each worker
   >> > receiver
   >> > so that each worker knows what the following data package is.
   (such as
   >> > length, package tag).
   >> >
   >> > The third worker can get its head information message from
   master but
   >> > cannot get its correct
   >> > data package.
   >> >
   >> > It got the data that should be received by first worker, which
   get its
   >> > correct data.
   >> >
   >>
   >>
   >> Jack,
   >>
   >> Providing the relevant sections of code here would be very helpful.
   >>
   >> <inside joke>
   >> I would tell you to add some printf statements to your code to
   see what
   >> data is stored in your variables on the master before it sends
   them to
   >> each node, but Jeff Squyres and I agreed to disagree in a civil
   manner
   >> on that debugging technique earlier this week, and I'd hate to
   re-open
   >> those old wounds by suggesting that technique here. ;)
   >> </inside joke>

Reply via email to