Hi Jeff,

Sorry, speaking in shorthand again.

Jeff Squyres wrote:
On Jan 8, 2010, at 5:03 PM, Peter Thompson wrote:

I've tried a few builds of 1.4 on Snow Leopard, and trying to start up TotalView
gets some of the more 'standard' problems.

I don't quite know what you mean by "standard" problems...?

That's more or less 'standard problems' that I hear described when someone tries to build and MPI (not just OpenMPI) and things don't work on first try. I don't know if you've worked on the interface directly, but you are probably aware that TotalView has an API where we set up a structure, MPIR_PROCTABLE, based on a typedef MPIR_PROCDESC, which gets filled in as to what processes are started up on which nodes. Which allows the debugger to attach to things automatically. If the build is done so that the files that hold these structures are optimized, sometimes the typedef is optimized away. Or in the case of other builds, the file may have the correct optimization (none) but the symbol info is stripped in the link phase. So it's a typical, or 'standard' issue I face, but hopefully not for you.

Either the typdef for MPIR_PROCDESC
can't be found, or MPIR_PROCTABLE is missing.  You can get things to work if you
start up TotalView first and then pick your program and go to the Parallel tab
and pick OpenMPI.  But it would be nice to get the classic launch working as 
well.

I'm unclear on how you could find these symbols if you start TV first, etc., 
but it won't work automatically.

One of the solutions we came up to work around this problem was to start up TotalView a different way, so that we need not rely on the symbol information at all. If you start TotalView the 'classic' way, mpirun/mpiexec -tv -np 4 ./foo, it will look for MPIR_PROCTABLE and the others. If you use the newer 'indirect' launch, we actually start up the debug servers with MPI, and then use some cached into to figure the correct process to start up with the debug servers and how many processes to start. With this method, the symbol information is not needed. This method works with OpenMPI on just about all platforms. However, some users prefer the classic launch with -tv, and this seems to be failing with the latest builds I've done on Darwin. The debug info appears to be preserved in the .o files, but does not always seem complete. It probably needs another look on my part, to make sure I'm doing it right. The fact that Snow Leopard (and maybe some earlier releases) now includes OpenMPI also confuses the issue, as the version that comes with Darwin does NOT contain the symbol info, and it's easy enough to get the native OpenMPI, and not pick up the build you intended.

Does that make any more sense?

I'll try playing around with 1.4.1 and see if it's me, or the compilers, or maybe OpenMPI.

PeterT


Do you have deeper knowledge (given your email address) on exactly what is 
going wrong?

Reply via email to