Hi Joshua, I don't think the new built-in rsh in later versions of Grid Engine is going to make any difference - the orted is the real starter of the MPI tasks and should have a greater influence on the task environment.
However, it would help if you can record the nice values and resource limits of each of the MPI task - you can easily do so by using a shell wrapper like this one: ======================================== #!/bin/sh # resource limit ulimit -a > /tmp/mpijob.$$ # nice value ps -eo pid,user,nice,command | grep $$ # run real executable <PATH to real executable> exit $? ======================================== Use mpirun to submit it as if it is the real MPI application - then you can see if there are limits introduced by Grid Engine that are causing issues... Rayson ================================= Open Grid Scheduler / Grid Engine http://gridscheduler.sourceforge.net/ Scalable Grid Engine Support Program http://www.scalablelogic.com/ On Thu, Mar 15, 2012 at 12:28 AM, Joshua Baker-LePain <jl...@duke.edu> wrote: > On Thu, 15 Mar 2012 at 12:44am, Reuti wrote > > >> Which version of SGE are you using? The traditional rsh startup was >> replaced by the builtin startup some time ago (although it should still >> work). > > > We're currently running the rather ancient 6.1u4 (due to the "If it ain't > broke..." philosophy). The hardware for our new queue master recently > arrived and I'll soon be upgrading to the most recent Open Grid Scheduler > release. Are you saying that the upgrade with the new builtin startup > method should avoid this problem? > > >> Maybe this shows already the problem: there are two `qrsh -inherit`, as >> Open MPI thinks these are different machines (I ran only with one slot on >> each host hence didn't get it first but can reproduce it now). But for SGE >> both may end up in the same queue overriding the openmpi-session in $TMPDIR. >> >> Although it's running: you get all output? If I request 4 slots and get >> one from each queue on both machines the mpihello outputs only 3 lines: the >> "Hello World from Node 3" is always missing. > > > I do seem to get all the output -- there are indeed 64 Hello World lines. > > Thanks again for all the help on this. This is one of the most productive > exchanges I've had on a mailing list in far too long. > > > -- > Joshua Baker-LePain > QB3 Shared Cluster Sysadmin > UCSF > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- ================================================== Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/