Your prior mails were about ssh issues, but this one sounds like you might have 
firewall issues.

That is, the "orted" command attempts to open a TCP socket back to mpirun for 
various command and control reasons.  If it is blocked from doing so by a 
firewall, Open MPI won't run.  In general, you can either disable your firewall 
or you can setup a trust relationship for TCP connections within your cluster.



On Feb 10, 2011, at 1:03 PM, Tena Sakai wrote:

> Hi Reuti,
> 
> Thanks for suggesting "LogLevel DEBUG3."  I did so and complete
> session is captured in the attached file.
> 
> What I did is much similar to what I have done before: verify
> that ssh works and then run mpirun command.  In my a bit lengthy
> session log, there are two responses from "LogLevel DEBUG3."  First
> from an scp invocation and then from mpirun invocation.  They both
> say
>    debug1: Authentication succeeded (publickey).
> 
>> From mpirun invocation, I see a line:
> 
>    debug1: Sending command:  orted --daemonize -mca ess env -mca
> orte_ess_jobid 3344891904 -mca orte_ess_vpid 1 -mca orte_ess_num_procs
>    2 --hnp-uri "3344891904.0;tcp://10.194.95.239:54256"
> The IP address at the end of the line is indeed that of machine B.
> After that there was hanging and I controlled-C out of it, which
> gave me more lines.  But the lines after
>    debug1: Sending command:  orted bla bla bla
> doesn't look good to me.  But, in truth, I have no idea what they
> mean.
> 
> If you could shed some light, I would appreciate it very much.
> 
> Regards,
> 
> Tena
> 
> 
> On 2/10/11 10:57 AM, "Reuti" <re...@staff.uni-marburg.de> wrote:
> 
>> Hi,
>> 
>> Am 10.02.2011 um 19:11 schrieb Tena Sakai:
>> 
>>>> your local machine is Linux like, but the execution hosts
>>>> are Macs? I saw the /Users/tsakai/... in your output.
>>> 
>>> No, my environment is entirely linux.  The path to my home
>>> directory on one host (blitzen) has been known as /Users/tsakai,
>>> despite it is an nfs mount from vixen (which is known to
>>> itself as /home/tsakai).  For historical reasons, I have
>>> chosen to give a symbolic link named /Users to vixen's /Home,
>>> so that I can use consistent path for both vixen and blitzen.
>> 
>> okay. Sometimes the protection of the home directory must be adjusted too, 
>> but
>> as you can do it from the command line this shouldn't be an issue.
>> 
>> 
>>>> Is this a private cluster (or at least private interfaces)?
>>>> It would also be an option to use hostbased authentication,
>>>> which will avoid setting any known_hosts file or passphraseless
>>>> ssh-keys for each user.
>>> 
>>> No, it is not a private cluster.  It is Amazon EC2.  When I
>>> Ssh from my local machine (vixen) I use its public interface,
>>> but to address from one amazon cluster node to the other I
>>> use nodes' private dns names: domU-12-31-39-07-35-21 and
>>> domU-12-31-39-06-74-E2.  Both public and private dns names
>>> change from a launch to another.  I am using passphrasesless
>>> ssh-keys for authentication in all cases, i.e., from vixen to
>>> Amazon node A, from amazon node A to amazon node B, and from
>>> Amazon node B back to A.  (Please see my initail post.  There
>>> is a session dialogue for this.)  They all work without authen-
>>> tication dialogue, except a brief initial dialogue:
>>>   The authenticity of host 'domu-xx-xx-xx-xx-xx-x (10.xx.xx.xx)'
>>>   can't be established.
>>>    RSA key fingerprint is e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81.
>>>    Are you sure you want to continue connecting (yes/no)?
>>> to which I say "yes."
>>> But I am unclear with what you mean by "hostbased authentication"?
>>> Doesn't that mean with password?  If so, it is not an option.
>> 
>> No. It's convenient inside a private cluster as it won't fill each users'
>> known_hosts file and you don't need to create any ssh-keys. But when the
>> hostname changes every time it might also create new hostkeys. It uses
>> hostkeys (private and public), this way it works for all users. Just for
>> reference:
>> 
>> http://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html
>> 
>> You could look into it later.
>> 
>> ==
>> 
>> - Can you try to use a command when connecting from A to B? E.g. ssh
>> `domU-12-31-39-06-74-E2 ls`. Is this working too?
>> 
>> - What about putting:
>> 
>> LogLevel DEBUG3
>> 
>> In your ~/.ssh/config. Maybe we can see what he's trying to negotiate before
>> it fails in verbose mode.
>> 
>> 
>> -- Reuti
>> 
>> 
>> 
>>> Regards,
>>> 
>>> Tena
>>> 
>>> 
>>> On 2/10/11 2:27 AM, "Reuti" <re...@staff.uni-marburg.de> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> your local machine is Linux like, but the execution hosts are Macs? I saw
>>>> the
>>>> /Users/tsakai/... in your output.
>>>> 
>>>> a) executing a command on them is also working, e.g.: ssh
>>>> domU-12-31-39-07-35-21 ls
>>>> 
>>>> Am 10.02.2011 um 07:08 schrieb Tena Sakai:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I have made a bit of progress(?)...
>>>>> I made a config file in my .ssh directory on the cloud.  It looks like:
>>>>>   # machine A
>>>>>   Host domU-12-31-39-07-35-21.compute-1.internal
>>>> 
>>>> This is just an abbreviation or nickname above. To use the specified
>>>> settings,
>>>> it's necessary to specify exactly this name. When the settings are the same
>>>> anyway for all machines, you can use:
>>>> 
>>>> Host *
>>>>   IdentityFile /home/tsakai/.ssh/tsakai
>>>>   IdentitiesOnly yes
>>>>   BatchMode yes
>>>> 
>>>> instead.
>>>> 
>>>> Is this a private cluster (or at least private interfaces)? It would also 
>>>> be
>>>> an option to use hostbased authentication, which will avoid setting any
>>>> known_hosts file or passphraseless ssh-keys for each user.
>>>> 
>>>> -- Reuti
>>>> 
>>>> 
>>>>>   HostName domU-12-31-39-07-35-21
>>>>>   BatchMode yes
>>>>>   IdentityFile /home/tsakai/.ssh/tsakai
>>>>>   ChallengeResponseAuthentication no
>>>>>   IdentitiesOnly yes
>>>>> 
>>>>>   # machine B
>>>>>   Host domU-12-31-39-06-74-E2.compute-1.internal
>>>>>   HostName domU-12-31-39-06-74-E2
>>>>>   BatchMode yes
>>>>>   IdentityFile /home/tsakai/.ssh/tsakai
>>>>>   ChallengeResponseAuthentication no
>>>>>   IdentitiesOnly yes
>>>>> 
>>>>> This file exists on both machine A and machine B.
>>>>> 
>>>>> Now When I issue mpirun command as below:
>>>>>   [tsakai@domU-12-31-39-06-74-E2 ~]$ mpirun -app app.ac2
>>>>> 
>>>>> It hungs.  I control-C out of it and I get:
>>>>>   mpirun: killing job...
>>>>> 
>>>>> 
>>>>> --------------------------------------------------------------------------
>>>>>   mpirun noticed that the job aborted, but has no info as to the process
>>>>>   that caused that situation.
>>>>> 
>>>>> --------------------------------------------------------------------------
>>>>> 
>>>>> --------------------------------------------------------------------------
>>>>>   mpirun was unable to cleanly terminate the daemons on the nodes shown
>>>>>   below. Additional manual cleanup may be required - please refer to
>>>>>   the "orte-clean" tool for assistance.
>>>>> 
>>>>> --------------------------------------------------------------------------
>>>>>       domU-12-31-39-07-35-21.compute-1.internal - daemon did not report
>>>>> back when launched
>>>>> 
>>>>> Am I making progress?
>>>>> 
>>>>> Does this mean I am past authentication and something else is the problem?
>>>>> Does someone have an example .ssh/config file I can look at?  There are so
>>>>> many keyword-argument paris for this config file and I would like to look
>>>>> at
>>>>> some very basic one that works.
>>>>> 
>>>>> Thank you.
>>>>> 
>>>>> Tena Sakai
>>>>> tsa...@gallo.ucsf.edu
>>>>> 
>>>>> On 2/9/11 7:52 PM, "Tena Sakai" <tsa...@gallo.ucsf.edu> wrote:
>>>>> 
>>>>>> Hi
>>>>>> 
>>>>>> I have an app.ac1 file like below:
>>>>>>   [tsakai@vixen local]$ cat app.ac1
>>>>>>   -H vixen.egcrc.org   -np 1 Rscript
>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 5
>>>>>>   -H vixen.egcrc.org   -np 1 Rscript
>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 6
>>>>>>   -H blitzen.egcrc.org -np 1 Rscript
>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 7
>>>>>>   -H blitzen.egcrc.org -np 1 Rscript
>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 8
>>>>>> 
>>>>>> The program I run is
>>>>>>   Rscript /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R x
>>>>>> Where x is [5..8].  The machines vixen and blitzen each run 2 runs.
>>>>>> 
>>>>>> Here’s the program fib.R:
>>>>>>   [ tsakai@vixen local]$ cat fib.R
>>>>>>       # fib() computes, given index n, fibonacci number iteratively
>>>>>>       # here's the first dozen sequence (indexed from 0..11)
>>>>>>       # 1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89
>>>>>> 
>>>>>>   fib <- function( n ) {
>>>>>>           a <- 0
>>>>>>           b <- 1
>>>>>>           for ( i in 1:n ) {
>>>>>>                t <- b
>>>>>>                b <- a
>>>>>>                a <- a + t
>>>>>>           }
>>>>>>       a
>>>>>> 
>>>>>>   arg <- commandArgs( TRUE )
>>>>>>   myHost <- system( 'hostname', intern=TRUE )
>>>>>>   cat( fib(arg), myHost, '\n' )
>>>>>> 
>>>>>> It reads an argument from command line and produces a fibonacci number
>>>>>> that
>>>>>> corresponds to that index, followed by the machine name.  Pretty simple
>>>>>> stuff.
>>>>>> 
>>>>>> Here’s the run output:
>>>>>>   [tsakai@vixen local]$ mpirun -app app.ac1
>>>>>>   5 vixen.egcrc.org
>>>>>>   8 vixen.egcrc.org
>>>>>>   13 blitzen.egcrc.org
>>>>>>   21 blitzen.egcrc.org
>>>>>> 
>>>>>> Which is exactly what I expect.  So far so good.
>>>>>> 
>>>>>> Now I want to run the same thing on cloud.  I launch 2 instances of the
>>>>>> same
>>>>>> virtual machine, to which I get to by:
>>>>>>   [tsakai@vixen local]$ ssh –A –I ~/.ssh/tsakai
>>>>>> machine-instance-A-public-dns
>>>>>> 
>>>>>> Now I am on machine A:
>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$
>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ # and I can go to machine B without
>>>>>> password authentication,
>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ # i.e., use public/private key
>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$
>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname
>>>>>>   domU-12-31-39-00-D1-F2
>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ ssh -i .ssh/tsakai
>>>>>> domU-12-31-39-0C-C8-01
>>>>>>   Last login: Wed Feb  9 20:51:48 2011 from 10.254.214.4
>>>>>>   [tsakai@domU-12-31-39-0C-C8-01 ~]$
>>>>>>   [tsakai@domU-12-31-39-0C-C8-01 ~]$ # I am now on machine B
>>>>>>   [tsakai@domU-12-31-39-0C-C8-01 ~]$ hostname
>>>>>>   domU-12-31-39-0C-C8-01
>>>>>>   [tsakai@domU-12-31-39-0C-C8-01 ~]$
>>>>>>   [tsakai@domU-12-31-39-0C-C8-01 ~]$ # now show I can get to machine A
>>>>>> without using password
>>>>>>   [tsakai@domU-12-31-39-0C-C8-01 ~]$
>>>>>>   [tsakai@domU-12-31-39-0C-C8-01 ~]$ ssh -i .ssh/tsakai
>>>>>> domU-12-31-39-00-D1-F2
>>>>>>   The authenticity of host 'domu-12-31-39-00-d1-f2 (10.254.214.4)' can't
>>>>>> be established.
>>>>>>   RSA key fingerprint is e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81.
>>>>>>   Are you sure you want to continue connecting (yes/no)? yes
>>>>>>   Warning: Permanently added 'domu-12-31-39-00-d1-f2' (RSA) to the list
>>>>>> of
>>>>>> known hosts.
>>>>>>   Last login: Wed Feb  9 20:49:34 2011 from 10.215.203.239
>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$
>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname
>>>>>>   domU-12-31-39-00-D1-F2
>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$
>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ exit
>>>>>>   logout
>>>>>>   Connection to domU-12-31-39-00-D1-F2 closed.
>>>>>>   [tsakai@domU-12-31-39-0C-C8-01 ~]$
>>>>>>   [tsakai@domU-12-31-39-0C-C8-01 ~]$ exit
>>>>>>   logout
>>>>>>   Connection to domU-12-31-39-0C-C8-01 closed.
>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$
>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ # back at machine A
>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname
>>>>>>   domU-12-31-39-00-D1-F2
>>>>>> 
>>>>>> As you can see, neither machine uses password for authentication; it uses
>>>>>> public/private key pairs.  There is no problem (that I can see) for ssh
>>>>>> invocation
>>>>>> from one machine to the other.  This is so because I have a copy of 
>>>>>> public
>>>>>> key
>>>>>> and a copy of private key on each instance.
>>>>>> 
>>>>>> The app.ac file is identical, except the node names:
>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ cat app.ac1
>>>>>>   -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 5
>>>>>>   -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 6
>>>>>>   -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 7
>>>>>>   -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 8
>>>>>> 
>>>>>> Here’s what happens with mpirun:
>>>>>> 
>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$ mpirun -app app.ac1
>>>>>>   tsakai@domu-12-31-39-0c-c8-01's password:
>>>>>>   Permission denied, please try again.
>>>>>>   tsakai@domu-12-31-39-0c-c8-01's password: mpirun: killing job...
>>>>>> 
>>>>>> 
>>>>>> --------------------------------------------------------------------------
>>>>>>   mpirun noticed that the job aborted, but has no info as to the process
>>>>>>   that caused that situation.
>>>>>> 
>>>>>> --------------------------------------------------------------------------
>>>>>> 
>>>>>>   mpirun: clean termination accomplished
>>>>>> 
>>>>>>   [tsakai@domU-12-31-39-00-D1-F2 ~]$
>>>>>> 
>>>>>> Mpirun (or somebody else?) asks me password, which I don’t have.
>>>>>> I end up typing control-C.
>>>>>> 
>>>>>> Here’s my question:
>>>>>> How can I get past authentication by mpirun where there is no password?
>>>>>> 
>>>>>> I would appreciate your help/insight greatly.
>>>>>> 
>>>>>> Thank you.
>>>>>> 
>>>>>> Tena Sakai
>>>>>> tsa...@gallo.ucsf.edu
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> <session4Reuti.text>_______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to