Hi,

I have made a bit more progress.  I think I can say ssh authenti-
cation problem is behind me now.  I am still having a problem running
mpirun, but the latest discovery, which I can reproduce, is that
I can run mpirun as root.  Here's the session log:

  [tsakai@vixen ec2]$ 2ec2 ec2-184-73-104-242.compute-1.amazonaws.com
  Last login: Fri Feb 11 00:41:11 2011 from 10.100.243.195
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ ll
  total 8
  -rw-rw-r-- 1 tsakai tsakai 274 Feb 11 00:47 app.ac
  -rwxr-xr-x 1 tsakai tsakai 379 Feb 11 00:48 fib.R
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ ll .ssh
  total 16
  -rw------- 1 tsakai tsakai  232 Feb  5 23:19 authorized_keys
  -rw------- 1 tsakai tsakai  102 Feb 11 00:34 config
  -rw-r--r-- 1 tsakai tsakai 1302 Feb 11 00:36 known_hosts
  -rw------- 1 tsakai tsakai  887 Feb  8 22:03 tsakai
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ ssh ip-10-100-243-195.ec2.internal
  Last login: Fri Feb 11 00:36:20 2011 from 10.195.198.31
  [tsakai@ip-10-100-243-195 ~]$
  [tsakai@ip-10-100-243-195 ~]$ # I am on machine B
  [tsakai@ip-10-100-243-195 ~]$ hostname
  ip-10-100-243-195
  [tsakai@ip-10-100-243-195 ~]$
  [tsakai@ip-10-100-243-195 ~]$ ll
  total 8
  -rw-rw-r-- 1 tsakai tsakai 274 Feb 11 00:44 app.ac
  -rwxr-xr-x 1 tsakai tsakai 379 Feb 11 00:47 fib.R
  [tsakai@ip-10-100-243-195 ~]$
  [tsakai@ip-10-100-243-195 ~]$
  [tsakai@ip-10-100-243-195 ~]$ cat app.ac
  -H ip-10-195-198-31.ec2.internal -np 1 Rscript /home/tsakai/fib.R 5
  -H ip-10-195-198-31.ec2.internal -np 1 Rscript /home/tsakai/fib.R 6
  -H ip-10-100-243-195.ec2.internal -np 1 Rscript /home/tsakai/fib.R 7
  -H ip-10-100-243-195.ec2.internal -np 1 Rscript /home/tsakai/fib.R 8
  [tsakai@ip-10-100-243-195 ~]$
  [tsakai@ip-10-100-243-195 ~]$ # go back to machine A
  [tsakai@ip-10-100-243-195 ~]$
  [tsakai@ip-10-100-243-195 ~]$ exit
  logout
  Connection to ip-10-100-243-195.ec2.internal closed.
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ hostname
  ip-10-195-198-31
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ # Execute mpirun
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ mpirun -app app.ac
  --------------------------------------------------------------------------
  mpirun was unable to launch the specified application as it encountered an
error:

  Error: pipe function call failed when setting up I/O forwarding subsystem
  Node: ip-10-195-198-31

  while attempting to start process rank 0.
  --------------------------------------------------------------------------
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ # try it as root
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ sudo su
  bash-3.2#
  bash-3.2# pwd
  /home/tsakai
  bash-3.2#
  bash-3.2# ls -l /root/.ssh/config
  -rw------- 1 root root 103 Feb 11 00:56 /root/.ssh/config
  bash-3.2#
  bash-3.2# cat /root/.ssh/config
  Host *
          IdentityFile /root/.ssh/.derobee/.kagi
          IdentitiesOnly yes
          BatchMode yes
  bash-3.2#
  bash-3.2# pwd
  /home/tsakai
  bash-3.2#
  bash-3.2# ls -l
  total 8
  -rw-rw-r-- 1 tsakai tsakai 274 Feb 11 00:47 app.ac
  -rwxr-xr-x 1 tsakai tsakai 379 Feb 11 00:48 fib.R
  bash-3.2#
  bash-3.2# # now is the time for mpirun
  bash-3.2#
  bash-3.2# mpirun --app ./app.ac
  13 ip-10-100-243-195
  21 ip-10-100-243-195
  5 ip-10-195-198-31
  8 ip-10-195-198-31
  bash-3.2#
  bash-3.2# # It works (being root)!
  bash-3.2#
  bash-3.2# exit
  exit
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ # try it one more time as tsakai
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ mpirun --app app.ac
  --------------------------------------------------------------------------
  mpirun was unable to launch the specified application as it encountered an
error:

  Error: pipe function call failed when setting up I/O forwarding subsystem
  Node: ip-10-195-198-31

  while attempting to start process rank 0.
  --------------------------------------------------------------------------
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ # I don't get it.
  [tsakai@ip-10-195-198-31 ~]$
  [tsakai@ip-10-195-198-31 ~]$ exit
  logout
  [tsakai@vixen ec2]$

So, why does it say "pipe function call failed when setting up
I/O forwarding subsystem Node: ip-10-195-198-31" ?
The node it is referring to is not the remote machine.  It is
What I call machine A.  I first thought maybe this is a problem
With PATH variable.  But I don't think so.  I compared root's
Path to that of tsaki's and made them identical and retried.
I got the same behavior.

If you could enlighten me why this is happening, I would really
Appreciate it.

Thank you.

Tena


On 2/10/11 4:12 PM, "Tena Sakai" <tsa...@gallo.ucsf.edu> wrote:

> Hi jeff,
>
> Thanks for the firewall tip.  I tried it while allowing all tip traffic
> and got interesting and preplexing result.  Here's what's interesting
> (BTW, I got rid of "LogLevel DEBUG3" from ./ssh/config on this run):
>
>    [tsakai@ip-10-203-21-132 ~]$
>    [tsakai@ip-10-203-21-132 ~]$ mpirun --app app.ac2
>    Host key verification failed.
>
> --------------------------------------------------------------------------
>    A daemon (pid 2743) died unexpectedly with status 255 while attempting
>    to launch so we are aborting.
>
>    There may be more information reported by the environment (see above).
>
>    This may be because the daemon was unable to find all the needed shared
>    libraries on the remote node. You may set your LD_LIBRARY_PATH to have
> the
>    location of the shared libraries on the remote nodes and this will
>    automatically be forwarded to the remote nodes.
>
> --------------------------------------------------------------------------
>
> --------------------------------------------------------------------------
>    mpirun noticed that the job aborted, but has no info as to the process
>    that caused that situation.
>
> --------------------------------------------------------------------------
>    mpirun: clean termination accomplished
>
>    [tsakai@ip-10-203-21-132 ~]$
>    [tsakai@ip-10-203-21-132 ~]$ env | grep LD_LIB
>    [tsakai@ip-10-203-21-132 ~]$
>    [tsakai@ip-10-203-21-132 ~]$ # Let's set LD_LIBRARY_PATH to
> /usr/local/lib
>    [tsakai@ip-10-203-21-132 ~]$
>    [tsakai@ip-10-203-21-132 ~]$
>    [tsakai@ip-10-203-21-132 ~]$ export LD_LIBRARY_PATH='/usr/local/lib'
>    [tsakai@ip-10-203-21-132 ~]$
>    [tsakai@ip-10-203-21-132 ~]$ # I better to this on machine B as well
>    [tsakai@ip-10-203-21-132 ~]$
>    [tsakai@ip-10-203-21-132 ~]$ ssh -i tsakai ip-10-195-171-159
>    Warning: Identity file tsakai not accessible: No such file or directory.
>    Last login: Thu Feb 10 18:31:20 2011 from 10.203.21.132
>    [tsakai@ip-10-195-171-159 ~]$
>    [tsakai@ip-10-195-171-159 ~]$ export LD_LIBRARY_PATH='/usr/local/lib'
>    [tsakai@ip-10-195-171-159 ~]$
>    [tsakai@ip-10-195-171-159 ~]$ env | grep LD_LIB
>    LD_LIBRARY_PATH=/usr/local/lib
>    [tsakai@ip-10-195-171-159 ~]$
>    [tsakai@ip-10-195-171-159 ~]$ # OK, now go bak to machine A
>    [tsakai@ip-10-195-171-159 ~]$ exit
>    logout
>    Connection to ip-10-195-171-159 closed.
>    [tsakai@ip-10-203-21-132 ~]$
>    [tsakai@ip-10-203-21-132 ~]$ hostname
>    ip-10-203-21-132
>    [tsakai@ip-10-203-21-132 ~]$ # try mpirun again
>    [tsakai@ip-10-203-21-132 ~]$
>    [tsakai@ip-10-203-21-132 ~]$ mpirun --app app.ac2
>    Host key verification failed.
>
> --------------------------------------------------------------------------
>    A daemon (pid 2789) died unexpectedly with status 255 while attempting
>    to launch so we are aborting.
>
>    There may be more information reported by the environment (see above).
>
>    This may be because the daemon was unable to find all the needed shared
>    libraries on the remote node. You may set your LD_LIBRARY_PATH to have
> the
>    location of the shared libraries on the remote nodes and this will
>    automatically be forwarded to the remote nodes.
>
> --------------------------------------------------------------------------
>
> --------------------------------------------------------------------------
>    mpirun noticed that the job aborted, but has no info as to the process
>    that caused that situation.
>
> --------------------------------------------------------------------------
>    mpirun: clean termination accomplished
>
>    [tsakai@ip-10-203-21-132 ~]$
>    [tsakai@ip-10-203-21-132 ~]$ # I thought openmpi library was in
> /usr/local/lib...
>    [tsakai@ip-10-203-21-132 ~]$
>    [tsakai@ip-10-203-21-132 ~]$ ll -t /usr/local/lib | less
>    total 16604
>    lrwxrwxrwx 1 root root      16 Feb  8 23:06 libfuse.so ->
> libfuse.so.2.8.5
>    lrwxrwxrwx 1 root root      16 Feb  8 23:06 libfuse.so.2 ->
> libfuse.so.2.8.5
>    lrwxrwxrwx 1 root root      25 Feb  8 23:06 libmca_common_sm.so ->
> libmca_common_sm.so.1.0.0
>    lrwxrwxrwx 1 root root      25 Feb  8 23:06 libmca_common_sm.so.1 ->
> libmca_common_sm.so.1.0.0
>    lrwxrwxrwx 1 root root      15 Feb  8 23:06 libmpi.so -> libmpi.so.0.0.2
>    lrwxrwxrwx 1 root root      15 Feb  8 23:06 libmpi.so.0 ->
> libmpi.so.0.0.2
>    lrwxrwxrwx 1 root root      19 Feb  8 23:06 libmpi_cxx.so ->
> libmpi_cxx.so.0.0.1
>    lrwxrwxrwx 1 root root      19 Feb  8 23:06 libmpi_cxx.so.0 ->
> libmpi_cxx.so.0.0.1
>    lrwxrwxrwx 1 root root      19 Feb  8 23:06 libmpi_f77.so ->
> libmpi_f77.so.0.0.1
>    lrwxrwxrwx 1 root root      19 Feb  8 23:06 libmpi_f77.so.0 ->
> libmpi_f77.so.0.0.1
>    lrwxrwxrwx 1 root root      19 Feb  8 23:06 libmpi_f90.so ->
> libmpi_f90.so.0.0.1
>    lrwxrwxrwx 1 root root      19 Feb  8 23:06 libmpi_f90.so.0 ->
> libmpi_f90.so.0.0.1
>    lrwxrwxrwx 1 root root      20 Feb  8 23:06 libopen-pal.so ->
> libopen-pal.so.0.0.0
>    lrwxrwxrwx 1 root root      20 Feb  8 23:06 libopen-pal.so.0 ->
> libopen-pal.so.0.0.0
>    lrwxrwxrwx 1 root root      20 Feb  8 23:06 libopen-rte.so ->
> libopen-rte.so.0.0.0
>    lrwxrwxrwx 1 root root      20 Feb  8 23:06 libopen-rte.so.0 ->
> libopen-rte.so.0.0.0
>    lrwxrwxrwx 1 root root      26 Feb  8 23:06 libopenmpi_malloc.so ->
> libopenmpi_malloc.so.0.0.0
>    lrwxrwxrwx 1 root root      26 Feb  8 23:06 libopenmpi_malloc.so.0 ->
> libopenmpi_malloc.so.0.0.0
>    lrwxrwxrwx 1 root root      20 Feb  8 23:06 libulockmgr.so ->
> libulockmgr.so.1.0.1
>    lrwxrwxrwx 1 root root      20 Feb  8 23:06 libulockmgr.so.1 ->
> libulockmgr.so.1.0.1
>    lrwxrwxrwx 1 root root      16 Feb  8 23:06 libxml2.so ->
> libxml2.so.2.7.2
>    lrwxrwxrwx 1 root root      16 Feb  8 23:06 libxml2.so.2 ->
> libxml2.so.2.7.2
>    -rw-r--r-- 1 root root  385912 Jan 26 01:00 libvt.a
>    [tsakai@ip-10-203-21-132 ~]$
>    [tsakai@ip-10-203-21-132 ~]$ # Now, I am really confused...
>    [tsakai@ip-10-203-21-132 ~]$
>
> Do you know why it's complaining about shared libraries?
>
> Thank you.
>
> Tena
>
>
> On 2/10/11 1:05 PM, "Jeff Squyres" <jsquy...@cisco.com> wrote:
>
>> Your prior mails were about ssh issues, but this one sounds like you might
>> have firewall issues.
>>
>> That is, the "orted" command attempts to open a TCP socket back to mpirun for
>> various command and control reasons.  If it is blocked from doing so by a
>> firewall, Open MPI won't run.  In general, you can either disable your
>> firewall or you can setup a trust relationship for TCP connections within
>> your
>> cluster.
>>
>>
>>
>> On Feb 10, 2011, at 1:03 PM, Tena Sakai wrote:
>>
>>> Hi Reuti,
>>>
>>> Thanks for suggesting "LogLevel DEBUG3."  I did so and complete
>>> session is captured in the attached file.
>>>
>>> What I did is much similar to what I have done before: verify
>>> that ssh works and then run mpirun command.  In my a bit lengthy
>>> session log, there are two responses from "LogLevel DEBUG3."  First
>>> from an scp invocation and then from mpirun invocation.  They both
>>> say
>>>    debug1: Authentication succeeded (publickey).
>>>
>>>> From mpirun invocation, I see a line:
>>>
>>>    debug1: Sending command:  orted --daemonize -mca ess env -mca
>>> orte_ess_jobid 3344891904 -mca orte_ess_vpid 1 -mca orte_ess_num_procs
>>>    2 --hnp-uri "3344891904.0;tcp://10.194.95.239:54256"
>>> The IP address at the end of the line is indeed that of machine B.
>>> After that there was hanging and I controlled-C out of it, which
>>> gave me more lines.  But the lines after
>>>    debug1: Sending command:  orted bla bla bla
>>> doesn't look good to me.  But, in truth, I have no idea what they
>>> mean.
>>>
>>> If you could shed some light, I would appreciate it very much.
>>>
>>> Regards,
>>>
>>> Tena
>>>
>>>
>>> On 2/10/11 10:57 AM, "Reuti" <re...@staff.uni-marburg.de> wrote:
>>>
>>>> Hi,
>>>>
>>>> Am 10.02.2011 um 19:11 schrieb Tena Sakai:
>>>>
>>>>>> your local machine is Linux like, but the execution hosts
>>>>>> are Macs? I saw the /Users/tsakai/... in your output.
>>>>>
>>>>> No, my environment is entirely linux.  The path to my home
>>>>> directory on one host (blitzen) has been known as /Users/tsakai,
>>>>> despite it is an nfs mount from vixen (which is known to
>>>>> itself as /home/tsakai).  For historical reasons, I have
>>>>> chosen to give a symbolic link named /Users to vixen's /Home,
>>>>> so that I can use consistent path for both vixen and blitzen.
>>>>
>>>> okay. Sometimes the protection of the home directory must be adjusted too,
>>>> but
>>>> as you can do it from the command line this shouldn't be an issue.
>>>>
>>>>
>>>>>> Is this a private cluster (or at least private interfaces)?
>>>>>> It would also be an option to use hostbased authentication,
>>>>>> which will avoid setting any known_hosts file or passphraseless
>>>>>> ssh-keys for each user.
>>>>>
>>>>> No, it is not a private cluster.  It is Amazon EC2.  When I
>>>>> Ssh from my local machine (vixen) I use its public interface,
>>>>> but to address from one amazon cluster node to the other I
>>>>> use nodes' private dns names: domU-12-31-39-07-35-21 and
>>>>> domU-12-31-39-06-74-E2.  Both public and private dns names
>>>>> change from a launch to another.  I am using passphrasesless
>>>>> ssh-keys for authentication in all cases, i.e., from vixen to
>>>>> Amazon node A, from amazon node A to amazon node B, and from
>>>>> Amazon node B back to A.  (Please see my initail post.  There
>>>>> is a session dialogue for this.)  They all work without authen-
>>>>> tication dialogue, except a brief initial dialogue:
>>>>>   The authenticity of host 'domu-xx-xx-xx-xx-xx-x (10.xx.xx.xx)'
>>>>>   can't be established.
>>>>>    RSA key fingerprint is e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81.
>>>>>    Are you sure you want to continue connecting (yes/no)?
>>>>> to which I say "yes."
>>>>> But I am unclear with what you mean by "hostbased authentication"?
>>>>> Doesn't that mean with password?  If so, it is not an option.
>>>>
>>>> No. It's convenient inside a private cluster as it won't fill each users'
>>>> known_hosts file and you don't need to create any ssh-keys. But when the
>>>> hostname changes every time it might also create new hostkeys. It uses
>>>> hostkeys (private and public), this way it works for all users. Just for
>>>> reference:
>>>>
>>>> http://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html
>>>>
>>>> You could look into it later.
>>>>
>>>> ==
>>>>
>>>> - Can you try to use a command when connecting from A to B? E.g. ssh
>>>> `domU-12-31-39-06-74-E2 ls`. Is this working too?
>>>>
>>>> - What about putting:
>>>>
>>>> LogLevel DEBUG3
>>>>
>>>> In your ~/.ssh/config. Maybe we can see what he's trying to negotiate
>>>> before
>>>> it fails in verbose mode.
>>>>
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>
>>>>> Regards,
>>>>>
>>>>> Tena
>>>>>
>>>>>
>>>>> On 2/10/11 2:27 AM, "Reuti" <re...@staff.uni-marburg.de> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> your local machine is Linux like, but the execution hosts are Macs? I saw
>>>>>> the
>>>>>> /Users/tsakai/... in your output.
>>>>>>
>>>>>> a) executing a command on them is also working, e.g.: ssh
>>>>>> domU-12-31-39-07-35-21 ls
>>>>>>
>>>>>> Am 10.02.2011 um 07:08 schrieb Tena Sakai:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have made a bit of progress(?)...
>>>>>>> I made a config file in my .ssh directory on the cloud.  It looks like:
>>>>>>>   # machine A
>>>>>>>   Host domU-12-31-39-07-35-21.compute-1.internal
>>>>>>
>>>>>> This is just an abbreviation or nickname above. To use the specified
>>>>>> settings,
>>>>>> it's necessary to specify exactly this name. When the settings are the
>>>>>> same
>>>>>> anyway for all machines, you can use:
>>>>>>
>>>>>> Host *
>>>>>>   IdentityFile /home/tsakai/.ssh/tsakai
>>>>>>   IdentitiesOnly yes
>>>>>>   BatchMode yes
>>>>>>
>>>>>> instead.
>>>>>>
>>>>>> Is this a private cluster (or at least private interfaces)? It would also
>>>>>> be
>>>>>> an option to use hostbased authentication, which will avoid setting any
>>>>>> known_hosts file or passphraseless ssh-keys for each user.
>>>>>>
>>>>>> -- Reuti
>>>>>>
>>>>>>
>>>>>>>   HostName domU-12-31-39-07-35-21
>>>>>>>   BatchMode yes
>>>>>>>   IdentityFile /home/tsakai/.ssh/tsakai
>>>>>>>   ChallengeResponseAuthentication no
>>>>>>>   IdentitiesOnly yes
>>>>>>>
>>>>>>>   # machine B
>>>>>>>   Host domU-12-31-39-06-74-E2.compute-1.internal
>>>>>>>   HostName domU-12-31-39-06-74-E2
>>>>>>>   BatchMode yes
>>>>>>>   IdentityFile /home/tsakai/.ssh/tsakai
>>>>>>>   ChallengeResponseAuthentication no
>>>>>>>   IdentitiesOnly yes
>>>>>>>
>>>>>>> This file exists on both machine A and machine B.
>>>>>>>
>>>>>>> Now When I issue mpirun command as below:
>>>>>>>   [tsakai@domU-12-31-39-06-74-E2 ~]$ mpirun -app app.ac2
>>>>>>>
>>>>>>> It hungs.  I control-C out of it and I get:
>>>>>>>   mpirun: killing job...
>>>>>>>
>>>>>>>
>>>>>>>
>
------------------------------------------------------------------------->>>>>>
>
> -
>>>>>>>   mpirun noticed that the job aborted, but has no info as to the process
>>>>>>>   that caused that situation.
>>>>>>>
>>>>>>>
>
------------------------------------------------------------------------->>>>>>
>
> -
>>>>>>>
>>>>>>>
>
------------------------------------------------------------------------->>>>>>
>
> -
>>>>>>>   mpirun was unable to cleanly terminate the daemons on the nodes shown
>>>>>>>   below. Additional manual cleanup may be required - please refer to
>>>>>>>   the "orte-clean" tool for assistance.
>>>>>>>
>>>>>>>
>
------------------------------------------------------------------------->>>>>>
>
> -
>>>>>>>       domU-12-31-39-07-35-21.compute-1.internal - daemon did not report
>>>>>>> back when launched
>>>>>>>
>>>>>>> Am I making progress?
>>>>>>>
>>>>>>> Does this mean I am past authentication and something else is the
>>>>>>> problem?
>>>>>>> Does someone have an example .ssh/config file I can look at?  There are
>>>>>>> so
>>>>>>> many keyword-argument paris for this config file and I would like to
>>>>>>> look
>>>>>>> at
>>>>>>> some very basic one that works.
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>> Tena Sakai
>>>>>>> tsa...@gallo.ucsf.edu
>>>>>>>
>>>>>>> On 2/9/11 7:52 PM, "Tena Sakai" <tsa...@gallo.ucsf.edu> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> I have an app.ac1 file like below:
>>>>>>>>   [tsakai@vixen local]$ cat app.ac1
>>>>>>>>   -H vixen.egcrc.org   -np 1 Rscript
>>>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 5
>>>>>>>>   -H vixen.egcrc.org   -np 1 Rscript
>>>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 6
>>>>>>>>   -H blitzen.egcrc.org -np 1 Rscript
>>>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 7
>>>>>>>>   -H blitzen.egcrc.org -np 1 Rscript
>>>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 8
>>>>>>>>
>>>>>>>> The program I run is
>>>>>>>>   Rscript /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R x
>>>>>>>> Where x is [5..8].  The machines vixen and blitzen each run 2 runs.
>>>>>>>>
>>>>>>>> Here¹s the program fib.R:
>>>>>>>>   [ tsakai@vixen local]$ cat fib.R
>>>>>>>>       # fib() computes, given index n, fibonacci number iteratively
>>>>>>>>       # here's the first dozen sequence (indexed from 0..11)
>>>>>>>>       # 1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89
>>>>>>>>
>>>>>>>>   fib <- function( n ) {
>>>>>>>>           a <- 0
>>>>>>>>           b <- 1
>>>>>>>>           for ( i in 1:n ) {
>>>>>>>>                t <- b
>>>>>>>>                b <- a
>>>>>>>>                a <- a + t
>>>>>>>>           }
>>>>>>>>       a
>>>>>>>>
>>>>>>>>   arg <- commandArgs( TRUE )
>>>>>>>>   myHost <- system( 'hostname', intern=TRUE )
>>>>>>>>   cat( fib(arg), myHost, '\n' )
>>>>>>>>
>>>>>>>> It reads an argument from command line and produces a fibonacci number
>>>>>>>> that
>>>>>>>> corresponds to that index, followed by the machine name.  Pretty simple
>>>>>>>> stuff.
>>>>>>>>
>>>>>>>> Here¹s the run output:
>>>>>>>>   [tsakai@vixen local]$ mpirun -app app.ac1
>>>>>>>>   5 vixen.egcrc.org
>>>>>>>>   8 vixen.egcrc.org
>>>>>>>>   13 blitzen.egcrc.org
>>>>>>>>   21 blitzen.egcrc.org
>>>>>>>>
>>>>>>>> Which is exactly what I expect.  So far so good.
>>>>>>>>
>>>>>>>> Now I want to run the same thing on cloud.  I launch 2 instances of the
>>>>>>>> same
>>>>>>>> virtual machine, to which I get to by:
>>>>>>>>   [tsakai@vixen local]$ ssh ­A ­I ~/.ssh/tsakai
>>>>>>>> machine-instance-A-public-dns
>>>>>>>>
>>>>>>>> Now I am on machine A:
>>>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$
>>>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ # and I can go to machine B
>>>>>>>> without
>>>>>>>> password authentication,
>>>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ # i.e., use public/private key
>>>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$
>>>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname
>>>>>>>>   domU-12-31-39-00-D1-F2
>>>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ ssh -i .ssh/tsakai
>>>>>>>> domU-12-31-39-0C-C8-01
>>>>>>>>   Last login: Wed Feb  9 20:51:48 2011 from 10.254.214.4
>>>>>>>>   [tsakai@domU-12-31-39-0C-C8-01 ~]$
>>>>>>>>   [tsakai@domU-12-31-39-0C-C8-01 ~]$ # I am now on machine B
>>>>>>>>   [tsakai@domU-12-31-39-0C-C8-01 ~]$ hostname
>>>>>>>>   domU-12-31-39-0C-C8-01
>>>>>>>>   [tsakai@domU-12-31-39-0C-C8-01 ~]$
>>>>>>>>   [tsakai@domU-12-31-39-0C-C8-01 ~]$ # now show I can get to machine A
>>>>>>>> without using password
>>>>>>>>   [tsakai@domU-12-31-39-0C-C8-01 ~]$
>>>>>>>>   [tsakai@domU-12-31-39-0C-C8-01 ~]$ ssh -i .ssh/tsakai
>>>>>>>> domU-12-31-39-00-D1-F2
>>>>>>>>   The authenticity of host 'domu-12-31-39-00-d1-f2 (10.254.214.4)'
>>>>>>>> can't
>>>>>>>> be established.
>>>>>>>>   RSA key fingerprint is
>>>>>>>> e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81.
>>>>>>>>   Are you sure you want to continue connecting (yes/no)? yes
>>>>>>>>   Warning: Permanently added 'domu-12-31-39-00-d1-f2' (RSA) to the list
>>>>>>>> of
>>>>>>>> known hosts.
>>>>>>>>   Last login: Wed Feb  9 20:49:34 2011 from 10.215.203.239
>>>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$
>>>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname
>>>>>>>>   domU-12-31-39-00-D1-F2
>>>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$
>>>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ exit
>>>>>>>>   logout
>>>>>>>>   Connection to domU-12-31-39-00-D1-F2 closed.
>>>>>>>>   [tsakai@domU-12-31-39-0C-C8-01 ~]$
>>>>>>>>   [tsakai@domU-12-31-39-0C-C8-01 ~]$ exit
>>>>>>>>   logout
>>>>>>>>   Connection to domU-12-31-39-0C-C8-01 closed.
>>>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$
>>>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ # back at machine A
>>>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname
>>>>>>>>   domU-12-31-39-00-D1-F2
>>>>>>>>
>>>>>>>> As you can see, neither machine uses password for authentication; it
>>>>>>>> uses
>>>>>>>> public/private key pairs.  There is no problem (that I can see) for ssh
>>>>>>>> invocation
>>>>>>>> from one machine to the other.  This is so because I have a copy of
>>>>>>>> public
>>>>>>>> key
>>>>>>>> and a copy of private key on each instance.
>>>>>>>>
>>>>>>>> The app.ac file is identical, except the node names:
>>>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ cat app.ac1
>>>>>>>>   -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 5
>>>>>>>>   -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 6
>>>>>>>>   -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 7
>>>>>>>>   -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 8
>>>>>>>>
>>>>>>>> Here¹s what happens with mpirun:
>>>>>>>>
>>>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ mpirun -app app.ac1
>>>>>>>>   tsakai@domu-12-31-39-0c-c8-01's password:
>>>>>>>>   Permission denied, please try again.
>>>>>>>>   tsakai@domu-12-31-39-0c-c8-01's password: mpirun: killing job...
>>>>>>>>
>>>>>>>>
>>>>>>>>
----------------------------------------------------------------------->>>>>>>>
-
>>>>>>>> --
>>>>>>>>   mpirun noticed that the job aborted, but has no info as to the
>>>>>>>> process
>>>>>>>>   that caused that situation.
>>>>>>>>
>>>>>>>>
----------------------------------------------------------------------->>>>>>>>
-
>>>>>>>> --
>>>>>>>>
>>>>>>>>   mpirun: clean termination accomplished
>>>>>>>>
>>>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$
>>>>>>>>
>>>>>>>> Mpirun (or somebody else?) asks me password, which I don¹t have.
>>>>>>>> I end up typing control-C.
>>>>>>>>
>>>>>>>> Here¹s my question:
>>>>>>>> How can I get past authentication by mpirun where there is no password?
>>>>>>>>
>>>>>>>> I would appreciate your help/insight greatly.
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>>
>>>>>>>> Tena Sakai
>>>>>>>> tsa...@gallo.ucsf.edu
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> <session4Reuti.text>_______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to