Hi, I have made a bit more progress. I think I can say ssh authenti- cation problem is behind me now. I am still having a problem running mpirun, but the latest discovery, which I can reproduce, is that I can run mpirun as root. Here's the session log:
[tsakai@vixen ec2]$ 2ec2 ec2-184-73-104-242.compute-1.amazonaws.com Last login: Fri Feb 11 00:41:11 2011 from 10.100.243.195 [tsakai@ip-10-195-198-31 ~]$ [tsakai@ip-10-195-198-31 ~]$ ll total 8 -rw-rw-r-- 1 tsakai tsakai 274 Feb 11 00:47 app.ac -rwxr-xr-x 1 tsakai tsakai 379 Feb 11 00:48 fib.R [tsakai@ip-10-195-198-31 ~]$ [tsakai@ip-10-195-198-31 ~]$ ll .ssh total 16 -rw------- 1 tsakai tsakai 232 Feb 5 23:19 authorized_keys -rw------- 1 tsakai tsakai 102 Feb 11 00:34 config -rw-r--r-- 1 tsakai tsakai 1302 Feb 11 00:36 known_hosts -rw------- 1 tsakai tsakai 887 Feb 8 22:03 tsakai [tsakai@ip-10-195-198-31 ~]$ [tsakai@ip-10-195-198-31 ~]$ ssh ip-10-100-243-195.ec2.internal Last login: Fri Feb 11 00:36:20 2011 from 10.195.198.31 [tsakai@ip-10-100-243-195 ~]$ [tsakai@ip-10-100-243-195 ~]$ # I am on machine B [tsakai@ip-10-100-243-195 ~]$ hostname ip-10-100-243-195 [tsakai@ip-10-100-243-195 ~]$ [tsakai@ip-10-100-243-195 ~]$ ll total 8 -rw-rw-r-- 1 tsakai tsakai 274 Feb 11 00:44 app.ac -rwxr-xr-x 1 tsakai tsakai 379 Feb 11 00:47 fib.R [tsakai@ip-10-100-243-195 ~]$ [tsakai@ip-10-100-243-195 ~]$ [tsakai@ip-10-100-243-195 ~]$ cat app.ac -H ip-10-195-198-31.ec2.internal -np 1 Rscript /home/tsakai/fib.R 5 -H ip-10-195-198-31.ec2.internal -np 1 Rscript /home/tsakai/fib.R 6 -H ip-10-100-243-195.ec2.internal -np 1 Rscript /home/tsakai/fib.R 7 -H ip-10-100-243-195.ec2.internal -np 1 Rscript /home/tsakai/fib.R 8 [tsakai@ip-10-100-243-195 ~]$ [tsakai@ip-10-100-243-195 ~]$ # go back to machine A [tsakai@ip-10-100-243-195 ~]$ [tsakai@ip-10-100-243-195 ~]$ exit logout Connection to ip-10-100-243-195.ec2.internal closed. [tsakai@ip-10-195-198-31 ~]$ [tsakai@ip-10-195-198-31 ~]$ hostname ip-10-195-198-31 [tsakai@ip-10-195-198-31 ~]$ [tsakai@ip-10-195-198-31 ~]$ # Execute mpirun [tsakai@ip-10-195-198-31 ~]$ [tsakai@ip-10-195-198-31 ~]$ mpirun -app app.ac -------------------------------------------------------------------------- mpirun was unable to launch the specified application as it encountered an error: Error: pipe function call failed when setting up I/O forwarding subsystem Node: ip-10-195-198-31 while attempting to start process rank 0. -------------------------------------------------------------------------- [tsakai@ip-10-195-198-31 ~]$ [tsakai@ip-10-195-198-31 ~]$ # try it as root [tsakai@ip-10-195-198-31 ~]$ [tsakai@ip-10-195-198-31 ~]$ sudo su bash-3.2# bash-3.2# pwd /home/tsakai bash-3.2# bash-3.2# ls -l /root/.ssh/config -rw------- 1 root root 103 Feb 11 00:56 /root/.ssh/config bash-3.2# bash-3.2# cat /root/.ssh/config Host * IdentityFile /root/.ssh/.derobee/.kagi IdentitiesOnly yes BatchMode yes bash-3.2# bash-3.2# pwd /home/tsakai bash-3.2# bash-3.2# ls -l total 8 -rw-rw-r-- 1 tsakai tsakai 274 Feb 11 00:47 app.ac -rwxr-xr-x 1 tsakai tsakai 379 Feb 11 00:48 fib.R bash-3.2# bash-3.2# # now is the time for mpirun bash-3.2# bash-3.2# mpirun --app ./app.ac 13 ip-10-100-243-195 21 ip-10-100-243-195 5 ip-10-195-198-31 8 ip-10-195-198-31 bash-3.2# bash-3.2# # It works (being root)! bash-3.2# bash-3.2# exit exit [tsakai@ip-10-195-198-31 ~]$ [tsakai@ip-10-195-198-31 ~]$ # try it one more time as tsakai [tsakai@ip-10-195-198-31 ~]$ [tsakai@ip-10-195-198-31 ~]$ mpirun --app app.ac -------------------------------------------------------------------------- mpirun was unable to launch the specified application as it encountered an error: Error: pipe function call failed when setting up I/O forwarding subsystem Node: ip-10-195-198-31 while attempting to start process rank 0. -------------------------------------------------------------------------- [tsakai@ip-10-195-198-31 ~]$ [tsakai@ip-10-195-198-31 ~]$ # I don't get it. [tsakai@ip-10-195-198-31 ~]$ [tsakai@ip-10-195-198-31 ~]$ exit logout [tsakai@vixen ec2]$ So, why does it say "pipe function call failed when setting up I/O forwarding subsystem Node: ip-10-195-198-31" ? The node it is referring to is not the remote machine. It is What I call machine A. I first thought maybe this is a problem With PATH variable. But I don't think so. I compared root's Path to that of tsaki's and made them identical and retried. I got the same behavior. If you could enlighten me why this is happening, I would really Appreciate it. Thank you. Tena On 2/10/11 4:12 PM, "Tena Sakai" <tsa...@gallo.ucsf.edu> wrote: > Hi jeff, > > Thanks for the firewall tip. I tried it while allowing all tip traffic > and got interesting and preplexing result. Here's what's interesting > (BTW, I got rid of "LogLevel DEBUG3" from ./ssh/config on this run): > > [tsakai@ip-10-203-21-132 ~]$ > [tsakai@ip-10-203-21-132 ~]$ mpirun --app app.ac2 > Host key verification failed. > > -------------------------------------------------------------------------- > A daemon (pid 2743) died unexpectedly with status 255 while attempting > to launch so we are aborting. > > There may be more information reported by the environment (see above). > > This may be because the daemon was unable to find all the needed shared > libraries on the remote node. You may set your LD_LIBRARY_PATH to have > the > location of the shared libraries on the remote nodes and this will > automatically be forwarded to the remote nodes. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > mpirun noticed that the job aborted, but has no info as to the process > that caused that situation. > > -------------------------------------------------------------------------- > mpirun: clean termination accomplished > > [tsakai@ip-10-203-21-132 ~]$ > [tsakai@ip-10-203-21-132 ~]$ env | grep LD_LIB > [tsakai@ip-10-203-21-132 ~]$ > [tsakai@ip-10-203-21-132 ~]$ # Let's set LD_LIBRARY_PATH to > /usr/local/lib > [tsakai@ip-10-203-21-132 ~]$ > [tsakai@ip-10-203-21-132 ~]$ > [tsakai@ip-10-203-21-132 ~]$ export LD_LIBRARY_PATH='/usr/local/lib' > [tsakai@ip-10-203-21-132 ~]$ > [tsakai@ip-10-203-21-132 ~]$ # I better to this on machine B as well > [tsakai@ip-10-203-21-132 ~]$ > [tsakai@ip-10-203-21-132 ~]$ ssh -i tsakai ip-10-195-171-159 > Warning: Identity file tsakai not accessible: No such file or directory. > Last login: Thu Feb 10 18:31:20 2011 from 10.203.21.132 > [tsakai@ip-10-195-171-159 ~]$ > [tsakai@ip-10-195-171-159 ~]$ export LD_LIBRARY_PATH='/usr/local/lib' > [tsakai@ip-10-195-171-159 ~]$ > [tsakai@ip-10-195-171-159 ~]$ env | grep LD_LIB > LD_LIBRARY_PATH=/usr/local/lib > [tsakai@ip-10-195-171-159 ~]$ > [tsakai@ip-10-195-171-159 ~]$ # OK, now go bak to machine A > [tsakai@ip-10-195-171-159 ~]$ exit > logout > Connection to ip-10-195-171-159 closed. > [tsakai@ip-10-203-21-132 ~]$ > [tsakai@ip-10-203-21-132 ~]$ hostname > ip-10-203-21-132 > [tsakai@ip-10-203-21-132 ~]$ # try mpirun again > [tsakai@ip-10-203-21-132 ~]$ > [tsakai@ip-10-203-21-132 ~]$ mpirun --app app.ac2 > Host key verification failed. > > -------------------------------------------------------------------------- > A daemon (pid 2789) died unexpectedly with status 255 while attempting > to launch so we are aborting. > > There may be more information reported by the environment (see above). > > This may be because the daemon was unable to find all the needed shared > libraries on the remote node. You may set your LD_LIBRARY_PATH to have > the > location of the shared libraries on the remote nodes and this will > automatically be forwarded to the remote nodes. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > mpirun noticed that the job aborted, but has no info as to the process > that caused that situation. > > -------------------------------------------------------------------------- > mpirun: clean termination accomplished > > [tsakai@ip-10-203-21-132 ~]$ > [tsakai@ip-10-203-21-132 ~]$ # I thought openmpi library was in > /usr/local/lib... > [tsakai@ip-10-203-21-132 ~]$ > [tsakai@ip-10-203-21-132 ~]$ ll -t /usr/local/lib | less > total 16604 > lrwxrwxrwx 1 root root 16 Feb 8 23:06 libfuse.so -> > libfuse.so.2.8.5 > lrwxrwxrwx 1 root root 16 Feb 8 23:06 libfuse.so.2 -> > libfuse.so.2.8.5 > lrwxrwxrwx 1 root root 25 Feb 8 23:06 libmca_common_sm.so -> > libmca_common_sm.so.1.0.0 > lrwxrwxrwx 1 root root 25 Feb 8 23:06 libmca_common_sm.so.1 -> > libmca_common_sm.so.1.0.0 > lrwxrwxrwx 1 root root 15 Feb 8 23:06 libmpi.so -> libmpi.so.0.0.2 > lrwxrwxrwx 1 root root 15 Feb 8 23:06 libmpi.so.0 -> > libmpi.so.0.0.2 > lrwxrwxrwx 1 root root 19 Feb 8 23:06 libmpi_cxx.so -> > libmpi_cxx.so.0.0.1 > lrwxrwxrwx 1 root root 19 Feb 8 23:06 libmpi_cxx.so.0 -> > libmpi_cxx.so.0.0.1 > lrwxrwxrwx 1 root root 19 Feb 8 23:06 libmpi_f77.so -> > libmpi_f77.so.0.0.1 > lrwxrwxrwx 1 root root 19 Feb 8 23:06 libmpi_f77.so.0 -> > libmpi_f77.so.0.0.1 > lrwxrwxrwx 1 root root 19 Feb 8 23:06 libmpi_f90.so -> > libmpi_f90.so.0.0.1 > lrwxrwxrwx 1 root root 19 Feb 8 23:06 libmpi_f90.so.0 -> > libmpi_f90.so.0.0.1 > lrwxrwxrwx 1 root root 20 Feb 8 23:06 libopen-pal.so -> > libopen-pal.so.0.0.0 > lrwxrwxrwx 1 root root 20 Feb 8 23:06 libopen-pal.so.0 -> > libopen-pal.so.0.0.0 > lrwxrwxrwx 1 root root 20 Feb 8 23:06 libopen-rte.so -> > libopen-rte.so.0.0.0 > lrwxrwxrwx 1 root root 20 Feb 8 23:06 libopen-rte.so.0 -> > libopen-rte.so.0.0.0 > lrwxrwxrwx 1 root root 26 Feb 8 23:06 libopenmpi_malloc.so -> > libopenmpi_malloc.so.0.0.0 > lrwxrwxrwx 1 root root 26 Feb 8 23:06 libopenmpi_malloc.so.0 -> > libopenmpi_malloc.so.0.0.0 > lrwxrwxrwx 1 root root 20 Feb 8 23:06 libulockmgr.so -> > libulockmgr.so.1.0.1 > lrwxrwxrwx 1 root root 20 Feb 8 23:06 libulockmgr.so.1 -> > libulockmgr.so.1.0.1 > lrwxrwxrwx 1 root root 16 Feb 8 23:06 libxml2.so -> > libxml2.so.2.7.2 > lrwxrwxrwx 1 root root 16 Feb 8 23:06 libxml2.so.2 -> > libxml2.so.2.7.2 > -rw-r--r-- 1 root root 385912 Jan 26 01:00 libvt.a > [tsakai@ip-10-203-21-132 ~]$ > [tsakai@ip-10-203-21-132 ~]$ # Now, I am really confused... > [tsakai@ip-10-203-21-132 ~]$ > > Do you know why it's complaining about shared libraries? > > Thank you. > > Tena > > > On 2/10/11 1:05 PM, "Jeff Squyres" <jsquy...@cisco.com> wrote: > >> Your prior mails were about ssh issues, but this one sounds like you might >> have firewall issues. >> >> That is, the "orted" command attempts to open a TCP socket back to mpirun for >> various command and control reasons. If it is blocked from doing so by a >> firewall, Open MPI won't run. In general, you can either disable your >> firewall or you can setup a trust relationship for TCP connections within >> your >> cluster. >> >> >> >> On Feb 10, 2011, at 1:03 PM, Tena Sakai wrote: >> >>> Hi Reuti, >>> >>> Thanks for suggesting "LogLevel DEBUG3." I did so and complete >>> session is captured in the attached file. >>> >>> What I did is much similar to what I have done before: verify >>> that ssh works and then run mpirun command. In my a bit lengthy >>> session log, there are two responses from "LogLevel DEBUG3." First >>> from an scp invocation and then from mpirun invocation. They both >>> say >>> debug1: Authentication succeeded (publickey). >>> >>>> From mpirun invocation, I see a line: >>> >>> debug1: Sending command: orted --daemonize -mca ess env -mca >>> orte_ess_jobid 3344891904 -mca orte_ess_vpid 1 -mca orte_ess_num_procs >>> 2 --hnp-uri "3344891904.0;tcp://10.194.95.239:54256" >>> The IP address at the end of the line is indeed that of machine B. >>> After that there was hanging and I controlled-C out of it, which >>> gave me more lines. But the lines after >>> debug1: Sending command: orted bla bla bla >>> doesn't look good to me. But, in truth, I have no idea what they >>> mean. >>> >>> If you could shed some light, I would appreciate it very much. >>> >>> Regards, >>> >>> Tena >>> >>> >>> On 2/10/11 10:57 AM, "Reuti" <re...@staff.uni-marburg.de> wrote: >>> >>>> Hi, >>>> >>>> Am 10.02.2011 um 19:11 schrieb Tena Sakai: >>>> >>>>>> your local machine is Linux like, but the execution hosts >>>>>> are Macs? I saw the /Users/tsakai/... in your output. >>>>> >>>>> No, my environment is entirely linux. The path to my home >>>>> directory on one host (blitzen) has been known as /Users/tsakai, >>>>> despite it is an nfs mount from vixen (which is known to >>>>> itself as /home/tsakai). For historical reasons, I have >>>>> chosen to give a symbolic link named /Users to vixen's /Home, >>>>> so that I can use consistent path for both vixen and blitzen. >>>> >>>> okay. Sometimes the protection of the home directory must be adjusted too, >>>> but >>>> as you can do it from the command line this shouldn't be an issue. >>>> >>>> >>>>>> Is this a private cluster (or at least private interfaces)? >>>>>> It would also be an option to use hostbased authentication, >>>>>> which will avoid setting any known_hosts file or passphraseless >>>>>> ssh-keys for each user. >>>>> >>>>> No, it is not a private cluster. It is Amazon EC2. When I >>>>> Ssh from my local machine (vixen) I use its public interface, >>>>> but to address from one amazon cluster node to the other I >>>>> use nodes' private dns names: domU-12-31-39-07-35-21 and >>>>> domU-12-31-39-06-74-E2. Both public and private dns names >>>>> change from a launch to another. I am using passphrasesless >>>>> ssh-keys for authentication in all cases, i.e., from vixen to >>>>> Amazon node A, from amazon node A to amazon node B, and from >>>>> Amazon node B back to A. (Please see my initail post. There >>>>> is a session dialogue for this.) They all work without authen- >>>>> tication dialogue, except a brief initial dialogue: >>>>> The authenticity of host 'domu-xx-xx-xx-xx-xx-x (10.xx.xx.xx)' >>>>> can't be established. >>>>> RSA key fingerprint is e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81. >>>>> Are you sure you want to continue connecting (yes/no)? >>>>> to which I say "yes." >>>>> But I am unclear with what you mean by "hostbased authentication"? >>>>> Doesn't that mean with password? If so, it is not an option. >>>> >>>> No. It's convenient inside a private cluster as it won't fill each users' >>>> known_hosts file and you don't need to create any ssh-keys. But when the >>>> hostname changes every time it might also create new hostkeys. It uses >>>> hostkeys (private and public), this way it works for all users. Just for >>>> reference: >>>> >>>> http://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html >>>> >>>> You could look into it later. >>>> >>>> == >>>> >>>> - Can you try to use a command when connecting from A to B? E.g. ssh >>>> `domU-12-31-39-06-74-E2 ls`. Is this working too? >>>> >>>> - What about putting: >>>> >>>> LogLevel DEBUG3 >>>> >>>> In your ~/.ssh/config. Maybe we can see what he's trying to negotiate >>>> before >>>> it fails in verbose mode. >>>> >>>> >>>> -- Reuti >>>> >>>> >>>> >>>>> Regards, >>>>> >>>>> Tena >>>>> >>>>> >>>>> On 2/10/11 2:27 AM, "Reuti" <re...@staff.uni-marburg.de> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> your local machine is Linux like, but the execution hosts are Macs? I saw >>>>>> the >>>>>> /Users/tsakai/... in your output. >>>>>> >>>>>> a) executing a command on them is also working, e.g.: ssh >>>>>> domU-12-31-39-07-35-21 ls >>>>>> >>>>>> Am 10.02.2011 um 07:08 schrieb Tena Sakai: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I have made a bit of progress(?)... >>>>>>> I made a config file in my .ssh directory on the cloud. It looks like: >>>>>>> # machine A >>>>>>> Host domU-12-31-39-07-35-21.compute-1.internal >>>>>> >>>>>> This is just an abbreviation or nickname above. To use the specified >>>>>> settings, >>>>>> it's necessary to specify exactly this name. When the settings are the >>>>>> same >>>>>> anyway for all machines, you can use: >>>>>> >>>>>> Host * >>>>>> IdentityFile /home/tsakai/.ssh/tsakai >>>>>> IdentitiesOnly yes >>>>>> BatchMode yes >>>>>> >>>>>> instead. >>>>>> >>>>>> Is this a private cluster (or at least private interfaces)? It would also >>>>>> be >>>>>> an option to use hostbased authentication, which will avoid setting any >>>>>> known_hosts file or passphraseless ssh-keys for each user. >>>>>> >>>>>> -- Reuti >>>>>> >>>>>> >>>>>>> HostName domU-12-31-39-07-35-21 >>>>>>> BatchMode yes >>>>>>> IdentityFile /home/tsakai/.ssh/tsakai >>>>>>> ChallengeResponseAuthentication no >>>>>>> IdentitiesOnly yes >>>>>>> >>>>>>> # machine B >>>>>>> Host domU-12-31-39-06-74-E2.compute-1.internal >>>>>>> HostName domU-12-31-39-06-74-E2 >>>>>>> BatchMode yes >>>>>>> IdentityFile /home/tsakai/.ssh/tsakai >>>>>>> ChallengeResponseAuthentication no >>>>>>> IdentitiesOnly yes >>>>>>> >>>>>>> This file exists on both machine A and machine B. >>>>>>> >>>>>>> Now When I issue mpirun command as below: >>>>>>> [tsakai@domU-12-31-39-06-74-E2 ~]$ mpirun -app app.ac2 >>>>>>> >>>>>>> It hungs. I control-C out of it and I get: >>>>>>> mpirun: killing job... >>>>>>> >>>>>>> >>>>>>> > ------------------------------------------------------------------------->>>>>> > > - >>>>>>> mpirun noticed that the job aborted, but has no info as to the process >>>>>>> that caused that situation. >>>>>>> >>>>>>> > ------------------------------------------------------------------------->>>>>> > > - >>>>>>> >>>>>>> > ------------------------------------------------------------------------->>>>>> > > - >>>>>>> mpirun was unable to cleanly terminate the daemons on the nodes shown >>>>>>> below. Additional manual cleanup may be required - please refer to >>>>>>> the "orte-clean" tool for assistance. >>>>>>> >>>>>>> > ------------------------------------------------------------------------->>>>>> > > - >>>>>>> domU-12-31-39-07-35-21.compute-1.internal - daemon did not report >>>>>>> back when launched >>>>>>> >>>>>>> Am I making progress? >>>>>>> >>>>>>> Does this mean I am past authentication and something else is the >>>>>>> problem? >>>>>>> Does someone have an example .ssh/config file I can look at? There are >>>>>>> so >>>>>>> many keyword-argument paris for this config file and I would like to >>>>>>> look >>>>>>> at >>>>>>> some very basic one that works. >>>>>>> >>>>>>> Thank you. >>>>>>> >>>>>>> Tena Sakai >>>>>>> tsa...@gallo.ucsf.edu >>>>>>> >>>>>>> On 2/9/11 7:52 PM, "Tena Sakai" <tsa...@gallo.ucsf.edu> wrote: >>>>>>> >>>>>>>> Hi >>>>>>>> >>>>>>>> I have an app.ac1 file like below: >>>>>>>> [tsakai@vixen local]$ cat app.ac1 >>>>>>>> -H vixen.egcrc.org -np 1 Rscript >>>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 5 >>>>>>>> -H vixen.egcrc.org -np 1 Rscript >>>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 6 >>>>>>>> -H blitzen.egcrc.org -np 1 Rscript >>>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 7 >>>>>>>> -H blitzen.egcrc.org -np 1 Rscript >>>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 8 >>>>>>>> >>>>>>>> The program I run is >>>>>>>> Rscript /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R x >>>>>>>> Where x is [5..8]. The machines vixen and blitzen each run 2 runs. >>>>>>>> >>>>>>>> Here¹s the program fib.R: >>>>>>>> [ tsakai@vixen local]$ cat fib.R >>>>>>>> # fib() computes, given index n, fibonacci number iteratively >>>>>>>> # here's the first dozen sequence (indexed from 0..11) >>>>>>>> # 1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 >>>>>>>> >>>>>>>> fib <- function( n ) { >>>>>>>> a <- 0 >>>>>>>> b <- 1 >>>>>>>> for ( i in 1:n ) { >>>>>>>> t <- b >>>>>>>> b <- a >>>>>>>> a <- a + t >>>>>>>> } >>>>>>>> a >>>>>>>> >>>>>>>> arg <- commandArgs( TRUE ) >>>>>>>> myHost <- system( 'hostname', intern=TRUE ) >>>>>>>> cat( fib(arg), myHost, '\n' ) >>>>>>>> >>>>>>>> It reads an argument from command line and produces a fibonacci number >>>>>>>> that >>>>>>>> corresponds to that index, followed by the machine name. Pretty simple >>>>>>>> stuff. >>>>>>>> >>>>>>>> Here¹s the run output: >>>>>>>> [tsakai@vixen local]$ mpirun -app app.ac1 >>>>>>>> 5 vixen.egcrc.org >>>>>>>> 8 vixen.egcrc.org >>>>>>>> 13 blitzen.egcrc.org >>>>>>>> 21 blitzen.egcrc.org >>>>>>>> >>>>>>>> Which is exactly what I expect. So far so good. >>>>>>>> >>>>>>>> Now I want to run the same thing on cloud. I launch 2 instances of the >>>>>>>> same >>>>>>>> virtual machine, to which I get to by: >>>>>>>> [tsakai@vixen local]$ ssh A I ~/.ssh/tsakai >>>>>>>> machine-instance-A-public-dns >>>>>>>> >>>>>>>> Now I am on machine A: >>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ # and I can go to machine B >>>>>>>> without >>>>>>>> password authentication, >>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ # i.e., use public/private key >>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname >>>>>>>> domU-12-31-39-00-D1-F2 >>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ ssh -i .ssh/tsakai >>>>>>>> domU-12-31-39-0C-C8-01 >>>>>>>> Last login: Wed Feb 9 20:51:48 2011 from 10.254.214.4 >>>>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ >>>>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ # I am now on machine B >>>>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ hostname >>>>>>>> domU-12-31-39-0C-C8-01 >>>>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ >>>>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ # now show I can get to machine A >>>>>>>> without using password >>>>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ >>>>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ ssh -i .ssh/tsakai >>>>>>>> domU-12-31-39-00-D1-F2 >>>>>>>> The authenticity of host 'domu-12-31-39-00-d1-f2 (10.254.214.4)' >>>>>>>> can't >>>>>>>> be established. >>>>>>>> RSA key fingerprint is >>>>>>>> e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81. >>>>>>>> Are you sure you want to continue connecting (yes/no)? yes >>>>>>>> Warning: Permanently added 'domu-12-31-39-00-d1-f2' (RSA) to the list >>>>>>>> of >>>>>>>> known hosts. >>>>>>>> Last login: Wed Feb 9 20:49:34 2011 from 10.215.203.239 >>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname >>>>>>>> domU-12-31-39-00-D1-F2 >>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ exit >>>>>>>> logout >>>>>>>> Connection to domU-12-31-39-00-D1-F2 closed. >>>>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ >>>>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ exit >>>>>>>> logout >>>>>>>> Connection to domU-12-31-39-0C-C8-01 closed. >>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ # back at machine A >>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname >>>>>>>> domU-12-31-39-00-D1-F2 >>>>>>>> >>>>>>>> As you can see, neither machine uses password for authentication; it >>>>>>>> uses >>>>>>>> public/private key pairs. There is no problem (that I can see) for ssh >>>>>>>> invocation >>>>>>>> from one machine to the other. This is so because I have a copy of >>>>>>>> public >>>>>>>> key >>>>>>>> and a copy of private key on each instance. >>>>>>>> >>>>>>>> The app.ac file is identical, except the node names: >>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ cat app.ac1 >>>>>>>> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 5 >>>>>>>> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 6 >>>>>>>> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 7 >>>>>>>> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 8 >>>>>>>> >>>>>>>> Here¹s what happens with mpirun: >>>>>>>> >>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ mpirun -app app.ac1 >>>>>>>> tsakai@domu-12-31-39-0c-c8-01's password: >>>>>>>> Permission denied, please try again. >>>>>>>> tsakai@domu-12-31-39-0c-c8-01's password: mpirun: killing job... >>>>>>>> >>>>>>>> >>>>>>>> ----------------------------------------------------------------------->>>>>>>> - >>>>>>>> -- >>>>>>>> mpirun noticed that the job aborted, but has no info as to the >>>>>>>> process >>>>>>>> that caused that situation. >>>>>>>> >>>>>>>> ----------------------------------------------------------------------->>>>>>>> - >>>>>>>> -- >>>>>>>> >>>>>>>> mpirun: clean termination accomplished >>>>>>>> >>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>>>>>>> >>>>>>>> Mpirun (or somebody else?) asks me password, which I don¹t have. >>>>>>>> I end up typing control-C. >>>>>>>> >>>>>>>> Here¹s my question: >>>>>>>> How can I get past authentication by mpirun where there is no password? >>>>>>>> >>>>>>>> I would appreciate your help/insight greatly. >>>>>>>> >>>>>>>> Thank you. >>>>>>>> >>>>>>>> Tena Sakai >>>>>>>> tsa...@gallo.ucsf.edu >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> <session4Reuti.text>_______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users