Jeff,

You broke my ksh (and I expect something else)
Today's SVN 1.4a1r19757
orte/mca/plm/rsh/plm_rsh_module.c
line 471:
        tmp = opal_argv_split("( test ! -r ./.profile || . ./.profile;", ' ');
                               ^
                               ARGHH
No (
        tmp = opal_argv_split(" test ! -r ./.profile || . ./.profile;", ' ');
and all is well again :)

Regards,
Mostyn

On Thu, 9 Oct 2008, Jeff Squyres wrote:

FWIW, the fix has been pushed into the trunk, 1.2.8, and 1.3 SVN branches. So I'll probably take down the hg tree (we use those as temporary branches).

On Oct 9, 2008, at 2:32 PM, Hahn Kim wrote:

Hi,

Thanks for providing a fix, sorry for the delay in response. Once I found out about -x, I've been busy working on the rest of our code, so I haven't had the time to try out the fix. I'll take a look at it soon as I can and will let you know how it works out.

Hahn

On Oct 7, 2008, at 5:41 PM, Jeff Squyres wrote:

On Oct 7, 2008, at 4:19 PM, Hahn Kim wrote:

you probably want to set the LD_LIBRARY_PATH (and PATH, likely, and
possibly others, such as that LICENSE key, etc.) regardless of
whether it's an interactive or non-interactive login.

Right, that's exactly what I want to do.  I was hoping that mpirun
would run .profile as the FAQ page stated, but the -x fix works for
now.

If you're using Bash, it should be running .bashrc.  But it looks like
you did identify a bug that we're *not* running .profile.  I have a
Mercurial branch up with a fix if you want to give it a spin:

  http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/sh-profile-fixes/

I just realized that I'm using .bash_profile on the x86 and need to
move its contents into .bashrc and call .bashrc from .bash_profile,
since eventually I will also be launching MPI jobs onto other x86
processors.

Thanks to everyone for their help.

Hahn

On Oct 7, 2008, at 2:16 PM, Jeff Squyres wrote:

On Oct 7, 2008, at 12:48 PM, Hahn Kim wrote:

Regarding 1., we're actually using 1.2.5.  We started using Open MPI
last winter and just stuck with it.  For now, using the -x flag with
mpirun works.  If this really is a bug in 1.2.7, then I think we'll
stick with 1.2.5 for now, then upgrade later when it's fixed.

It looks like this behavior has been the same throughout the entire
1.2 series.

Regarding 2., are you saying I should run the commands you suggest
from the x86 node running bash, so that ssh logs into the Cell node
running Bourne?

I'm saying that if "ssh othernode env" gives different answers than
"ssh othernode"/"env", then your .bashrc or .profile or whatever is
dumping out early depending on whether you have an interactive login
or not.  This is the real cause of the error -- you probably want to
set the LD_LIBRARY_PATH (and PATH, likely, and possibly others, such
as that LICENSE key, etc.) regardless of whether it's an interactive
or non-interactive login.


When I run "ssh othernode env" from the x86 node, I get the
following vanilla environment:

USER=ha17646
HOME=/home/ha17646
LOGNAME=ha17646
SHELL=/bin/sh
PWD=/home/ha17646

When I run "ssh othernode" from the x86 node, then run "env" on the
Cell, I get the following:

USER=ha17646
LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
HOME=/home/ha17646
MCS_LICENSE_PATH=/opt/MultiCorePlus/mcf.key
LOGNAME=ha17646
TERM=xterm-color
PATH=/usr/local/bin:/usr/bin:/sbin:/bin:/tools/openmpi-1.2.5/bin:/
tools/cmake-2.4.7/bin:/tools
SHELL=/bin/sh
PWD=/home/ha17646
TZ=EST5EDT

Hahn

On Oct 7, 2008, at 12:07 PM, Jeff Squyres wrote:

Ralph and I just talked about this a bit:

1. In all released versions of OMPI, we *do* source the .profile
file
on the target node if it exists (because vanilla Bourne shells do
not
source anything on remote nodes -- Bash does, though, per the FAQ).
However, looking in 1.2.7, it looks like it might not be executing
that code -- there *may* be a bug in this area.  We're checking
into it.

2. You might want to check your configuration to see if
your .bashrc
is dumping out early because it's a non-interactive shell.  Check
the
output of:

ssh othernode env
vs.
ssh othernode
env

(i.e., a non-interactive running of "env" vs. an interactive login
and
running "env")



On Oct 7, 2008, at 8:53 AM, Ralph Castain wrote:

I am unaware of anything in the code that would "source .profile"
for you. I believe the FAQ page is in error here.

Ralph

On Oct 6, 2008, at 7:47 PM, Hahn Kim wrote:

Great, that worked, thanks!  However, it still concerns me that
the
FAQ page says that mpirun will execute .profile which doesn't
seem
to work for me.  Are there any configuration issues that could
possibly be preventing mpirun from doing this?  It would
certainly
be more convenient if I could maintain my environment in a
single .profile file instead of adding what could potentially
be a
lot of -x arguments to my mpirun command.

Hahn

On Oct 6, 2008, at 5:44 PM, Aur?lien Bouteiller wrote:

tYou can forward your local env with mpirun -x
LD_LIBRARY_PATH. As
an
alternative you can set specific values with mpirun -x
LD_LIBRARY_PATH=/some/where:/some/where/else . More information
with
mpirun --help (or man mpirun).

Aurelien



Le 6 oct. 08 ? 16:06, Hahn Kim a ?crit :

Hi,

I'm having difficulty launching an Open MPI job onto a machine
that
is running the Bourne shell.

Here's my basic setup.  I have two machines, one is an x86-
based
machine running bash and the other is a Cell-based machine
running
Bourne shell.  I'm running mpirun from the x86 machine, which
launches a C++ MPI application onto the Cell machine.  I get
the
following error:

error while loading shared libraries: libstdc++.so.6: cannot
open
shared object file: No such file or directory

The basic problem is that LD_LIBRARY_PATH needs to be set to
the
directory that contains libstdc++.so.6 for the Cell.  I set the
following line in .profile:

export LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32

which is the path to the PPC libraries for Cell.

Now if I log directly into the Cell machine and run the program
directly from the command line, I don't get the above error.
But
mpirun still fails, even after setting LD_LIBRARY_PATH
in .profile.

As a sanity check, I did the following.  I ran the following
command
from the x86 machine:

mpirun -np 1 --host cab0 env

which, among others things, shows me the following value:

LD_LIBRARY_PATH=/tools/openmpi-1.2.5/lib:

If I log into the Cell machine and run env directly from the
command
line, I get the following value:

LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32

So it appears that .profile gets sourced when I log in but not
when
mpirun runs.

However, according to the OpenMPI FAQ (http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
), mpirun is supposed to directly call .profile since Bourne
shell
doesn't automatically call it for non-interactive shells.

Does anyone have any insight as to why my environment isn't
being
set properly?  Thanks!

Hahn

--
Hahn Kim, h...@ll.mit.edu
MIT Lincoln Laboratory
244 Wood St., Lexington, MA 02420
Tel: 781-981-0940, Fax: 781-981-5255






_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
* Dr. Aur?lien Bouteiller
* Sr. Research Associate at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321





_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Hahn Kim
MIT Lincoln Laboratory   Phone: (781) 981-0940
244 Wood Street, S2-252  Fax: (781) 981-5255
Lexington, MA 02420      E-mail: h...@ll.mit.edu




_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Hahn Kim, h...@ll.mit.edu
MIT Lincoln Laboratory
244 Wood St., Lexington, MA 02420
Tel: 781-981-0940, Fax: 781-981-5255







_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Hahn Kim, h...@ll.mit.edu
MIT Lincoln Laboratory
244 Wood St., Lexington, MA 02420
Tel: 781-981-0940, Fax: 781-981-5255







_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Hahn Kim, h...@ll.mit.edu
MIT Lincoln Laboratory
244 Wood St., Lexington, MA 02420
Tel: 781-981-0940, Fax: 781-981-5255







_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to