[MTT users] Status

2006-08-30 Thread Jeff Squyres
Josh noticed that Test Run data is not currently being recorded. I actually had already filed ticket #42 about this -- just to let you all know, we're aware of the problem and Ethan is working on it. Also, I just brought over the CSH script fix that Josh identified earlier (i.e., the sourceable s

Re: [MTT users] Tests timing out

2006-08-30 Thread Jeff Squyres
Hum. Ok, let me go check MTT code... Yes, here's the relevant code: if ($mpi_install->{libdir}) { if (exists($ENV{LD_LIBRARY_PATH})) { $ENV{LD_LIBRARY_PATH} = "$mpi_install->{libdir}:" . $ENV{LD_LIBRARY_PATH}; } else { $ENV{LD_LIBRARY_P

[MTT users] Wiki page

2006-08-30 Thread Jeff Squyres
BTW, per Josh's suggestion, I added: http://svn.open-mpi.org/trac/mtt/wiki/Troubleshooting Feel free to edit/expand... -- Jeff Squyres Server Virtualization Business Unit Cisco Systems

Re: [MTT users] Tests timing out

2006-08-30 Thread Josh Hursey
On Aug 30, 2006, at 2:49 PM, Jeff Squyres wrote: More specifically -- I have now fixed this on the trunk (that the csh file that gets dropped emits if (\$?LD_LIBRARY_PATH == 0) then setenv LD_LIBRARY_PATH $ret->{libdir} else setenv LD_LIBRARY_PATH $ret->{libdir}:\$LD_LIBRARY_PATH end

Re: [MTT users] Tests timing out

2006-08-30 Thread Jeff Squyres
More specifically -- I have now fixed this on the trunk (that the csh file that gets dropped emits if (\$?LD_LIBRARY_PATH == 0) then setenv LD_LIBRARY_PATH $ret->{libdir} else setenv LD_LIBRARY_PATH $ret->{libdir}:\$LD_LIBRARY_PATH endif\n"; (I didn't have the $? Escaped before, so it emi

Re: [MTT users] Tests timing out

2006-08-30 Thread Jeff Squyres
? How could that fix your hanging? The code I'm talking about in MTT is the part that drops those files. We don't actually *use* those files in MTT anywhere -- they're solely for humans to use after the fact... If you're suddenly running properly, I'm suspicious. :-( On 8/30/06 2:38 PM, "Jos

Re: [MTT users] Tests timing out

2006-08-30 Thread Josh Hursey
This fixes the hanging and gets me running (and passing) some/most of the tests [Trivial and ibm]. Yay! I have a 16 processor job running on Odin at the moment that seems to be going well so far. Thanks for your help. Want me to file a bug about the tcsh problem below? -- Josh On Aug 30,

Re: [MTT users] Tests timing out

2006-08-30 Thread Jeff Squyres
Bah! This is the result of perl expanding $? To 0 -- it seems that I need to escape $? So that it's not output as 0. Sorry about that! So is this just for the sourcing files, or for your overall (hanging) problems? On 8/30/06 2:28 PM, "Josh Hursey" wrote: > So here are the results of my expl

Re: [MTT users] Tests timing out

2006-08-30 Thread Josh Hursey
So here are the results of my exploration. I have things running now. The problem was that the user that I am running under does not set the LD_LIBRARY_PATH variable at any point. So when MTT tries to export the variable it does: if (0LD_LIBRARY_PATH == 0) then setenv LD_LIBRARY_PATH /sa

Re: [MTT users] Tests timing out

2006-08-30 Thread Jeff Squyres
On 8/30/06 12:10 PM, "Josh Hursey" wrote: >> MTT directly sets environment variables in its own environment (via >> $ENV{whatever} = "foo") before using fork/exec to launch compiles >> and runs. >> Hence, the forked children inherit the environment variables that >> we set >> (E.g., PATH and LD_L

Re: [MTT users] Tests timing out

2006-08-30 Thread Josh Hursey
On Aug 30, 2006, at 11:36 AM, Jeff Squyres wrote: (sorry -- been afk much of this morning) MTT directly sets environment variables in its own environment (via $ENV{whatever} = "foo") before using fork/exec to launch compiles and runs. Hence, the forked children inherit the environment variab

Re: [MTT users] Tests timing out

2006-08-30 Thread Jeff Squyres
FWIW, I am pretty sure that "srun -b myscript" *used* to work. But there must be something different about the environment between the two (-A and -b)...? For one thing, mpirun is running on the first node of the allocation with -b (vs. The head node for -A), but I wouldn't think that that would

Re: [MTT users] Tests timing out

2006-08-30 Thread Jeff Squyres
(sorry -- been afk much of this morning) MTT directly sets environment variables in its own environment (via $ENV{whatever} = "foo") before using fork/exec to launch compiles and runs. Hence, the forked children inherit the environment variables that we set (E.g., PATH and LD_LIBRARY_PATH). So if

Re: [MTT users] Tests timing out

2006-08-30 Thread Josh Hursey
I'm trying to replicate the MTT environment as much as possible, and have a couple of questions. Assume there is no mpirun in my PATH/LD_LIBRARY_PATH when I start MTT. After MTT builds Open MPI, how does it export these variables so that it can build the tests? How does it export these when

Re: [MTT users] Tests timing out

2006-08-30 Thread Josh Hursey
I already tried that. However I'm trying it in a couple different ways and getting some mixed results. Let me formulate the error cases and get back to you. Cheers, Josh On Aug 30, 2006, at 10:17 AM, Ralph H Castain wrote: Well, why don't you try first separating this from MTT? Just run t

Re: [MTT users] Tests timing out

2006-08-30 Thread Josh Hursey
yet another point (sorry for the spam). This may not be an MTT issue but a broken ORTE on the trunk :( When I try to run in a allocation (srun -N 16 -A) things run fine. But if I try to run in batch mode (srun -N 16 -b myscript.sh) then I see the same hang as in MTT. seems that mpirun is no

Re: [MTT users] Tests timing out

2006-08-30 Thread Josh Hursey
forgot this bit in my mail. With the mpirun just hanging out there I attached GDB and got the following stack trace: (gdb) bt #0 0x003d1b9bd1af in poll () from /lib64/tls/libc.so.6 #1 0x002a956e6389 in opal_poll_dispatch (base=0x5136d0, arg=0x513730, tv=0x7fbfffee70) at poll.c:191 #2

Re: [MTT users] Tests timing out

2006-08-30 Thread Josh Hursey
On Aug 30, 2006, at 7:19 AM, Jeff Squyres wrote: On 8/29/06 8:57 PM, "Josh Hursey" wrote: Does this apply to *all* tests, or only some of the tests (like allgather)? All of the tests: Trivial and ibm. They all timeout :( Blah. The trivial tests are simply "hello world", so they should

Re: [MTT users] OMPI snapshot tarball generation

2006-08-30 Thread Jeff Squyres
Heh. Sent that too soon -- I meant to refer to: http://www.open-mpi.org/community/lists/devel/2006/08/1018.php On 8/30/06 8:38 AM, "Jeff Squyres" wrote: > FYI -- see: > > > > This means that MTT will potentially have to test less stuff. More > specifically, MTT will only have a tarbal

[MTT users] OMPI snapshot tarball generation

2006-08-30 Thread Jeff Squyres
FYI -- see: This means that MTT will potentially have to test less stuff. More specifically, MTT will only have a tarball to test when there is actually something new to test. Hence, this can significantly decrease the proability of their being 1.1 and 1.0 tarballs to test, and therefore lower

Re: [MTT users] Tests timing out

2006-08-30 Thread Jeff Squyres
On 8/29/06 8:57 PM, "Josh Hursey" wrote: >> Does this apply to *all* tests, or only some of the tests (like >> allgather)? > > All of the tests: Trivial and ibm. They all timeout :( Blah. The trivial tests are simply "hello world", so they should take just about no time at all. Is this runnin

[MTT users] Update your checkouts

2006-08-30 Thread Jeff Squyres
We moved a few fixes and improvements over to the MTT release branch yesterday; you probably want to run "svn up" in your MTT checkouts. I also added a "tips and tricks" section to the wiki on the OMPI Testing page for some of the gotchas that have occurred so far. Indeed, we'll be carefully moni