Hi Reuti, Thanks for suggesting "LogLevel DEBUG3." I did so and complete session is captured in the attached file.
What I did is much similar to what I have done before: verify that ssh works and then run mpirun command. In my a bit lengthy session log, there are two responses from "LogLevel DEBUG3." First from an scp invocation and then from mpirun invocation. They both say debug1: Authentication succeeded (publickey). >From mpirun invocation, I see a line: debug1: Sending command: orted --daemonize -mca ess env -mca orte_ess_jobid 3344891904 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri "3344891904.0;tcp://10.194.95.239:54256" The IP address at the end of the line is indeed that of machine B. After that there was hanging and I controlled-C out of it, which gave me more lines. But the lines after debug1: Sending command: orted bla bla bla doesn't look good to me. But, in truth, I have no idea what they mean. If you could shed some light, I would appreciate it very much. Regards, Tena On 2/10/11 10:57 AM, "Reuti" <re...@staff.uni-marburg.de> wrote: > Hi, > > Am 10.02.2011 um 19:11 schrieb Tena Sakai: > >>> your local machine is Linux like, but the execution hosts >>> are Macs? I saw the /Users/tsakai/... in your output. >> >> No, my environment is entirely linux. The path to my home >> directory on one host (blitzen) has been known as /Users/tsakai, >> despite it is an nfs mount from vixen (which is known to >> itself as /home/tsakai). For historical reasons, I have >> chosen to give a symbolic link named /Users to vixen's /Home, >> so that I can use consistent path for both vixen and blitzen. > > okay. Sometimes the protection of the home directory must be adjusted too, but > as you can do it from the command line this shouldn't be an issue. > > >>> Is this a private cluster (or at least private interfaces)? >>> It would also be an option to use hostbased authentication, >>> which will avoid setting any known_hosts file or passphraseless >>> ssh-keys for each user. >> >> No, it is not a private cluster. It is Amazon EC2. When I >> Ssh from my local machine (vixen) I use its public interface, >> but to address from one amazon cluster node to the other I >> use nodes' private dns names: domU-12-31-39-07-35-21 and >> domU-12-31-39-06-74-E2. Both public and private dns names >> change from a launch to another. I am using passphrasesless >> ssh-keys for authentication in all cases, i.e., from vixen to >> Amazon node A, from amazon node A to amazon node B, and from >> Amazon node B back to A. (Please see my initail post. There >> is a session dialogue for this.) They all work without authen- >> tication dialogue, except a brief initial dialogue: >> The authenticity of host 'domu-xx-xx-xx-xx-xx-x (10.xx.xx.xx)' >> can't be established. >> RSA key fingerprint is e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81. >> Are you sure you want to continue connecting (yes/no)? >> to which I say "yes." >> But I am unclear with what you mean by "hostbased authentication"? >> Doesn't that mean with password? If so, it is not an option. > > No. It's convenient inside a private cluster as it won't fill each users' > known_hosts file and you don't need to create any ssh-keys. But when the > hostname changes every time it might also create new hostkeys. It uses > hostkeys (private and public), this way it works for all users. Just for > reference: > > http://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html > > You could look into it later. > > == > > - Can you try to use a command when connecting from A to B? E.g. ssh > `domU-12-31-39-06-74-E2 ls`. Is this working too? > > - What about putting: > > LogLevel DEBUG3 > > In your ~/.ssh/config. Maybe we can see what he's trying to negotiate before > it fails in verbose mode. > > > -- Reuti > > > >> Regards, >> >> Tena >> >> >> On 2/10/11 2:27 AM, "Reuti" <re...@staff.uni-marburg.de> wrote: >> >>> Hi, >>> >>> your local machine is Linux like, but the execution hosts are Macs? I saw >>> the >>> /Users/tsakai/... in your output. >>> >>> a) executing a command on them is also working, e.g.: ssh >>> domU-12-31-39-07-35-21 ls >>> >>> Am 10.02.2011 um 07:08 schrieb Tena Sakai: >>> >>>> Hi, >>>> >>>> I have made a bit of progress(?)... >>>> I made a config file in my .ssh directory on the cloud. It looks like: >>>> # machine A >>>> Host domU-12-31-39-07-35-21.compute-1.internal >>> >>> This is just an abbreviation or nickname above. To use the specified >>> settings, >>> it's necessary to specify exactly this name. When the settings are the same >>> anyway for all machines, you can use: >>> >>> Host * >>> IdentityFile /home/tsakai/.ssh/tsakai >>> IdentitiesOnly yes >>> BatchMode yes >>> >>> instead. >>> >>> Is this a private cluster (or at least private interfaces)? It would also be >>> an option to use hostbased authentication, which will avoid setting any >>> known_hosts file or passphraseless ssh-keys for each user. >>> >>> -- Reuti >>> >>> >>>> HostName domU-12-31-39-07-35-21 >>>> BatchMode yes >>>> IdentityFile /home/tsakai/.ssh/tsakai >>>> ChallengeResponseAuthentication no >>>> IdentitiesOnly yes >>>> >>>> # machine B >>>> Host domU-12-31-39-06-74-E2.compute-1.internal >>>> HostName domU-12-31-39-06-74-E2 >>>> BatchMode yes >>>> IdentityFile /home/tsakai/.ssh/tsakai >>>> ChallengeResponseAuthentication no >>>> IdentitiesOnly yes >>>> >>>> This file exists on both machine A and machine B. >>>> >>>> Now When I issue mpirun command as below: >>>> [tsakai@domU-12-31-39-06-74-E2 ~]$ mpirun -app app.ac2 >>>> >>>> It hungs. I control-C out of it and I get: >>>> mpirun: killing job... >>>> >>>> >>>> -------------------------------------------------------------------------- >>>> mpirun noticed that the job aborted, but has no info as to the process >>>> that caused that situation. >>>> >>>> -------------------------------------------------------------------------- >>>> >>>> -------------------------------------------------------------------------- >>>> mpirun was unable to cleanly terminate the daemons on the nodes shown >>>> below. Additional manual cleanup may be required - please refer to >>>> the "orte-clean" tool for assistance. >>>> >>>> -------------------------------------------------------------------------- >>>> domU-12-31-39-07-35-21.compute-1.internal - daemon did not report >>>> back when launched >>>> >>>> Am I making progress? >>>> >>>> Does this mean I am past authentication and something else is the problem? >>>> Does someone have an example .ssh/config file I can look at? There are so >>>> many keyword-argument paris for this config file and I would like to look >>>> at >>>> some very basic one that works. >>>> >>>> Thank you. >>>> >>>> Tena Sakai >>>> tsa...@gallo.ucsf.edu >>>> >>>> On 2/9/11 7:52 PM, "Tena Sakai" <tsa...@gallo.ucsf.edu> wrote: >>>> >>>>> Hi >>>>> >>>>> I have an app.ac1 file like below: >>>>> [tsakai@vixen local]$ cat app.ac1 >>>>> -H vixen.egcrc.org -np 1 Rscript >>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 5 >>>>> -H vixen.egcrc.org -np 1 Rscript >>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 6 >>>>> -H blitzen.egcrc.org -np 1 Rscript >>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 7 >>>>> -H blitzen.egcrc.org -np 1 Rscript >>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 8 >>>>> >>>>> The program I run is >>>>> Rscript /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R x >>>>> Where x is [5..8]. The machines vixen and blitzen each run 2 runs. >>>>> >>>>> Here¹s the program fib.R: >>>>> [ tsakai@vixen local]$ cat fib.R >>>>> # fib() computes, given index n, fibonacci number iteratively >>>>> # here's the first dozen sequence (indexed from 0..11) >>>>> # 1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 >>>>> >>>>> fib <- function( n ) { >>>>> a <- 0 >>>>> b <- 1 >>>>> for ( i in 1:n ) { >>>>> t <- b >>>>> b <- a >>>>> a <- a + t >>>>> } >>>>> a >>>>> >>>>> arg <- commandArgs( TRUE ) >>>>> myHost <- system( 'hostname', intern=TRUE ) >>>>> cat( fib(arg), myHost, '\n' ) >>>>> >>>>> It reads an argument from command line and produces a fibonacci number >>>>> that >>>>> corresponds to that index, followed by the machine name. Pretty simple >>>>> stuff. >>>>> >>>>> Here¹s the run output: >>>>> [tsakai@vixen local]$ mpirun -app app.ac1 >>>>> 5 vixen.egcrc.org >>>>> 8 vixen.egcrc.org >>>>> 13 blitzen.egcrc.org >>>>> 21 blitzen.egcrc.org >>>>> >>>>> Which is exactly what I expect. So far so good. >>>>> >>>>> Now I want to run the same thing on cloud. I launch 2 instances of the >>>>> same >>>>> virtual machine, to which I get to by: >>>>> [tsakai@vixen local]$ ssh A I ~/.ssh/tsakai >>>>> machine-instance-A-public-dns >>>>> >>>>> Now I am on machine A: >>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ # and I can go to machine B without >>>>> password authentication, >>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ # i.e., use public/private key >>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname >>>>> domU-12-31-39-00-D1-F2 >>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ ssh -i .ssh/tsakai >>>>> domU-12-31-39-0C-C8-01 >>>>> Last login: Wed Feb 9 20:51:48 2011 from 10.254.214.4 >>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ >>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ # I am now on machine B >>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ hostname >>>>> domU-12-31-39-0C-C8-01 >>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ >>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ # now show I can get to machine A >>>>> without using password >>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ >>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ ssh -i .ssh/tsakai >>>>> domU-12-31-39-00-D1-F2 >>>>> The authenticity of host 'domu-12-31-39-00-d1-f2 (10.254.214.4)' can't >>>>> be established. >>>>> RSA key fingerprint is e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81. >>>>> Are you sure you want to continue connecting (yes/no)? yes >>>>> Warning: Permanently added 'domu-12-31-39-00-d1-f2' (RSA) to the list >>>>> of >>>>> known hosts. >>>>> Last login: Wed Feb 9 20:49:34 2011 from 10.215.203.239 >>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname >>>>> domU-12-31-39-00-D1-F2 >>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ exit >>>>> logout >>>>> Connection to domU-12-31-39-00-D1-F2 closed. >>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ >>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ exit >>>>> logout >>>>> Connection to domU-12-31-39-0C-C8-01 closed. >>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ # back at machine A >>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname >>>>> domU-12-31-39-00-D1-F2 >>>>> >>>>> As you can see, neither machine uses password for authentication; it uses >>>>> public/private key pairs. There is no problem (that I can see) for ssh >>>>> invocation >>>>> from one machine to the other. This is so because I have a copy of public >>>>> key >>>>> and a copy of private key on each instance. >>>>> >>>>> The app.ac file is identical, except the node names: >>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ cat app.ac1 >>>>> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 5 >>>>> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 6 >>>>> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 7 >>>>> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 8 >>>>> >>>>> Here¹s what happens with mpirun: >>>>> >>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ mpirun -app app.ac1 >>>>> tsakai@domu-12-31-39-0c-c8-01's password: >>>>> Permission denied, please try again. >>>>> tsakai@domu-12-31-39-0c-c8-01's password: mpirun: killing job... >>>>> >>>>> >>>>> -------------------------------------------------------------------------- >>>>> mpirun noticed that the job aborted, but has no info as to the process >>>>> that caused that situation. >>>>> >>>>> -------------------------------------------------------------------------- >>>>> >>>>> mpirun: clean termination accomplished >>>>> >>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ >>>>> >>>>> Mpirun (or somebody else?) asks me password, which I don¹t have. >>>>> I end up typing control-C. >>>>> >>>>> Here¹s my question: >>>>> How can I get past authentication by mpirun where there is no password? >>>>> >>>>> I would appreciate your help/insight greatly. >>>>> >>>>> Thank you. >>>>> >>>>> Tena Sakai >>>>> tsa...@gallo.ucsf.edu >>>>> >>>>> >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
session4Reuti.text
Description: session4Reuti.text