Sounds about right. I'm not near a keyboard to check the reasons why pipe(2) 
would fail. 

Specifically, OMPI is failing when it is trying to setup stdin/stdout/stderr 
forwarding for your job. Very strange. 

Sent from my PDA. No type good. 

On Feb 11, 2011, at 9:56 PM, "Tena Sakai" <tsa...@gallo.ucsf.edu> wrote:

> Hi Gus,
> 
> Thank you for your tips.
> 
> I didn't find any smoking gun or anything comes close.
> Here's the upshot:
> 
>  [tsakai@ip-10-114-239-188 ~]$ ulimit -a
>  core file size          (blocks, -c) 0
>  data seg size           (kbytes, -d) unlimited
>  scheduling priority             (-e) 0
>  file size               (blocks, -f) unlimited
>  pending signals                 (-i) 61504
>  max locked memory       (kbytes, -l) 32
>  max memory size         (kbytes, -m) unlimited
>  open files                      (-n) 1024
>  pipe size            (512 bytes, -p) 8
>  POSIX message queues     (bytes, -q) 819200
>  real-time priority              (-r) 0
>  stack size              (kbytes, -s) 8192
>  cpu time               (seconds, -t) unlimited
>  max user processes              (-u) 61504
>  virtual memory          (kbytes, -v) unlimited
>  file locks                      (-x) unlimited
>  [tsakai@ip-10-114-239-188 ~]$
>  [tsakai@ip-10-114-239-188 ~]$ sudo su
>  bash-3.2#
>  bash-3.2# ulimit -a
>  core file size          (blocks, -c) 0
>  data seg size           (kbytes, -d) unlimited
>  scheduling priority             (-e) 0
>  file size               (blocks, -f) unlimited
>  pending signals                 (-i) 61504
>  max locked memory       (kbytes, -l) 32
>  max memory size         (kbytes, -m) unlimited
>  open files                      (-n) 1024
>  pipe size            (512 bytes, -p) 8
>  POSIX message queues     (bytes, -q) 819200
>  real-time priority              (-r) 0
>  stack size              (kbytes, -s) 8192
>  cpu time               (seconds, -t) unlimited
>  max user processes              (-u) unlimited
>  virtual memory          (kbytes, -v) unlimited
>  file locks                      (-x) unlimited
>  bash-3.2#
>  bash-3.2#
>  bash-3.2# ulimit -a > root_ulimit-a
>  bash-3.2# exit
>  [tsakai@ip-10-114-239-188 ~]$
>  [tsakai@ip-10-114-239-188 ~]$ ulimit -a > tsakai_ulimit-a
>  [tsakai@ip-10-114-239-188 ~]$
>  [tsakai@ip-10-114-239-188 ~]$ diff root_ulimit-a tsakai_ulimit-a
>  14c14
>  < max user processes              (-u) unlimited
>  ---
>> max user processes              (-u) 61504
>  [tsakai@ip-10-114-239-188 ~]$
>  [tsakai@ip-10-114-239-188 ~]$ cat /proc/sys/fs/file-nr
> /proc/sys/fs/file-max
>  480     0       762674
>  762674
>  [tsakai@ip-10-114-239-188 ~]$
>  [tsakai@ip-10-114-239-188 ~]$ sudo su
>  bash-3.2#
>  bash-3.2# cat /proc/sys/fs/file-nr /proc/sys/fs/file-max
>  512     0       762674
>  762674
>  bash-3.2# exit
>  exit
>  [tsakai@ip-10-114-239-188 ~]$
>  [tsakai@ip-10-114-239-188 ~]$
>  [tsakai@ip-10-114-239-188 ~]$ sysctl -a |grep fs.file-max
>  -bash: sysctl: command not found
>  [tsakai@ip-10-114-239-188 ~]$
>  [tsakai@ip-10-114-239-188 ~]$ /sbin/!!
>  /sbin/sysctl -a |grep fs.file-max
>  error: permission denied on key 'kernel.cad_pid'
>  error: permission denied on key 'kernel.cap-bound'
>  fs.file-max = 762674
>  [tsakai@ip-10-114-239-188 ~]$
>  [tsakai@ip-10-114-239-188 ~]$ sudo /sbin/sysctl -a | grep fs.file-max
>  fs.file-max = 762674
>  [tsakai@ip-10-114-239-188 ~]$
> 
> I see a bit of difference between root and tsakai, but I cannot
> believe such small difference results in somewhat a catastrophic
> failure as I have reported.  Would you agree with me?
> 
> Regards,
> 
> Tena
> 
> On 2/11/11 6:06 PM, "Gus Correa" <g...@ldeo.columbia.edu> wrote:
> 
>> Hi Tena
>> 
>> Please read one answer inline.
>> 
>> Tena Sakai wrote:
>>> Hi Jeff,
>>> Hi Gus,
>>> 
>>> Thanks for your replies.
>>> 
>>> I have pretty much ruled out PATH issues by setting tsakai's PATH
>>> as identical to that of root.  In that setting I reproduced the
>>> same result as before: root can run mpirun correctly and tsakai
>>> cannot.
>>> 
>>> I have also checked out permission on /tmp directory.  tsakai has
>>> no problem creating files under /tmp.
>>> 
>>> I am trying to come up with a strategy to show that each and every
>>> programs in the PATH has "world" executable permission.  It is a
>>> stone to turn over, but I am not holding my breath.
>>> 
>>>> ... you are running out of file descriptors. Are file descriptors
>>>> limited on a per-process basis, perchance?
>>> 
>>> I have never heard there is such restriction on Amazon EC2.  There
>>> are folks who keep running instances for a long, long time.  Whereas
>>> in my case, I launch 2 instances, check things out, and then turn
>>> the instances off.  (Given that the state of California has a huge
>>> debts, our funding is very tight.)  So, I really doubt that's the
>>> case.  I have run mpirun unsuccessfully as user tsakai and immediately
>>> after successfully as root.  Still, I would be happy if you can tell
>>> me a way to tell number of file descriptors used or remmain.
>>> 
>>> Your mentioned file descriptors made me think of something under
>>> /dev.  But I don't know exactly what I am fishing.  Do you have
>>> some suggestions?
>>> 
>> 
>> 1) If the environment has anything to do with Linux,
>> check:
>> 
>> cat /proc/sys/fs/file-nr /proc/sys/fs/file-max
>> 
>> 
>> or
>> 
>> sysctl -a |grep fs.file-max
>> 
>> This max can be set (fs.file-max=whatever_is_reasonable)
>> in /etc/sysctl.conf
>> 
>> See 'man sysctl' and 'man sysctl.conf'
>> 
>> 2) Another possible source of limits.
>> 
>> Check "ulimit -a" (bash) or "limit" (tcsh).
>> 
>> If you need to change look at:
>> 
>> /etc/security/limits.conf
>> 
>> (See also 'man limits.conf')
>> 
>> **
>> 
>> Since "root can but Tena cannot",
>> I would check 2) first,
>> as they are the 'per user/per group' limits,
>> whereas 1) is kernel/system-wise.
>> 
>> I hope this helps,
>> Gus Correa
>> 
>> PS - I know you are a wise and careful programmer,
>> but here we had cases of programs that would
>> fail because of too many files that were open and never closed,
>> eventually exceeding the max available/permissible.
>> So, it does happen.
>> 
>>> I wish I could reproduce this (weired) behavior on a different
>>> set of machines.  I certainly cannot in my local environment.  Sigh!
>>> 
>>> Regards,
>>> 
>>> Tena
>>> 
>>> 
>>> On 2/11/11 3:17 PM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote:
>>> 
>>>> It is concerning if the pipe system call fails - I can't think of why that
>>>> would happen. Thats not usually a permissions issue but rather a deeper
>>>> indication that something is either seriously wrong on your system or you
>>>> are
>>>> running out of file descriptors. Are file descriptors limited on a
>>>> per-process
>>>> basis, perchance?
>>>> 
>>>> Sent from my PDA. No type good.
>>>> 
>>>> On Feb 11, 2011, at 10:08 AM, "Gus Correa" <g...@ldeo.columbia.edu> wrote:
>>>> 
>>>>> Hi Tena
>>>>> 
>>>>> Since root can but you can't,
>>>>> is is a directory permission problem perhaps?
>>>>> Check the execution directory permission (on both machines,
>>>>> if this is not NFS mounted dir).
>>>>> I am not sure, but IIRR OpenMPI also uses /tmp for
>>>>> under-the-hood stuff, worth checking permissions there also.
>>>>> Just a naive guess.
>>>>> 
>>>>> Congrats for all the progress with the cloudy MPI!
>>>>> 
>>>>> Gus Correa
>>>>> 
>>>>> Tena Sakai wrote:
>>>>>> Hi,
>>>>>> I have made a bit more progress.  I think I can say ssh authenti-
>>>>>> cation problem is behind me now.  I am still having a problem running
>>>>>> mpirun, but the latest discovery, which I can reproduce, is that
>>>>>> I can run mpirun as root.  Here's the session log:
>>>>>> [tsakai@vixen ec2]$ 2ec2 ec2-184-73-104-242.compute-1.amazonaws.com
>>>>>> Last login: Fri Feb 11 00:41:11 2011 from 10.100.243.195
>>>>>> [tsakai@ip-10-195-198-31 ~]$
>>>>>> [tsakai@ip-10-195-198-31 ~]$ ll
>>>>>> total 8
>>>>>> -rw-rw-r-- 1 tsakai tsakai 274 Feb 11 00:47 app.ac
>>>>>> -rwxr-xr-x 1 tsakai tsakai 379 Feb 11 00:48 fib.R
>>>>>> [tsakai@ip-10-195-198-31 ~]$
>>>>>> [tsakai@ip-10-195-198-31 ~]$ ll .ssh
>>>>>> total 16
>>>>>> -rw------- 1 tsakai tsakai  232 Feb  5 23:19 authorized_keys
>>>>>> -rw------- 1 tsakai tsakai  102 Feb 11 00:34 config
>>>>>> -rw-r--r-- 1 tsakai tsakai 1302 Feb 11 00:36 known_hosts
>>>>>> -rw------- 1 tsakai tsakai  887 Feb  8 22:03 tsakai
>>>>>> [tsakai@ip-10-195-198-31 ~]$
>>>>>> [tsakai@ip-10-195-198-31 ~]$ ssh ip-10-100-243-195.ec2.internal
>>>>>> Last login: Fri Feb 11 00:36:20 2011 from 10.195.198.31
>>>>>> [tsakai@ip-10-100-243-195 ~]$
>>>>>> [tsakai@ip-10-100-243-195 ~]$ # I am on machine B
>>>>>> [tsakai@ip-10-100-243-195 ~]$ hostname
>>>>>> ip-10-100-243-195
>>>>>> [tsakai@ip-10-100-243-195 ~]$
>>>>>> [tsakai@ip-10-100-243-195 ~]$ ll
>>>>>> total 8
>>>>>> -rw-rw-r-- 1 tsakai tsakai 274 Feb 11 00:44 app.ac
>>>>>> -rwxr-xr-x 1 tsakai tsakai 379 Feb 11 00:47 fib.R
>>>>>> [tsakai@ip-10-100-243-195 ~]$
>>>>>> [tsakai@ip-10-100-243-195 ~]$
>>>>>> [tsakai@ip-10-100-243-195 ~]$ cat app.ac
>>>>>> -H ip-10-195-198-31.ec2.internal -np 1 Rscript /home/tsakai/fib.R 5
>>>>>> -H ip-10-195-198-31.ec2.internal -np 1 Rscript /home/tsakai/fib.R 6
>>>>>> -H ip-10-100-243-195.ec2.internal -np 1 Rscript /home/tsakai/fib.R 7
>>>>>> -H ip-10-100-243-195.ec2.internal -np 1 Rscript /home/tsakai/fib.R 8
>>>>>> [tsakai@ip-10-100-243-195 ~]$
>>>>>> [tsakai@ip-10-100-243-195 ~]$ # go back to machine A
>>>>>> [tsakai@ip-10-100-243-195 ~]$
>>>>>> [tsakai@ip-10-100-243-195 ~]$ exit
>>>>>> logout
>>>>>> Connection to ip-10-100-243-195.ec2.internal closed.
>>>>>> [tsakai@ip-10-195-198-31 ~]$
>>>>>> [tsakai@ip-10-195-198-31 ~]$ hostname
>>>>>> ip-10-195-198-31
>>>>>> [tsakai@ip-10-195-198-31 ~]$
>>>>>> [tsakai@ip-10-195-198-31 ~]$ # Execute mpirun
>>>>>> [tsakai@ip-10-195-198-31 ~]$
>>>>>> [tsakai@ip-10-195-198-31 ~]$ mpirun -app app.ac
>>>>>> 
>>>>>> --------------------------------------------------------------------------
>>>>>> mpirun was unable to launch the specified application as it encountered
>>>>>> an
>>>>>> error:
>>>>>> Error: pipe function call failed when setting up I/O forwarding subsystem
>>>>>> Node: ip-10-195-198-31
>>>>>> while attempting to start process rank 0.
>>>>>> 
>>>>>> --------------------------------------------------------------------------
>>>>>> [tsakai@ip-10-195-198-31 ~]$
>>>>>> [tsakai@ip-10-195-198-31 ~]$ # try it as root
>>>>>> [tsakai@ip-10-195-198-31 ~]$
>>>>>> [tsakai@ip-10-195-198-31 ~]$ sudo su
>>>>>> bash-3.2#
>>>>>> bash-3.2# pwd
>>>>>> /home/tsakai
>>>>>> bash-3.2#
>>>>>> bash-3.2# ls -l /root/.ssh/config
>>>>>> -rw------- 1 root root 103 Feb 11 00:56 /root/.ssh/config
>>>>>> bash-3.2#
>>>>>> bash-3.2# cat /root/.ssh/config
>>>>>> Host *
>>>>>>         IdentityFile /root/.ssh/.derobee/.kagi
>>>>>>         IdentitiesOnly yes
>>>>>>         BatchMode yes
>>>>>> bash-3.2#
>>>>>> bash-3.2# pwd
>>>>>> /home/tsakai
>>>>>> bash-3.2#
>>>>>> bash-3.2# ls -l
>>>>>> total 8
>>>>>> -rw-rw-r-- 1 tsakai tsakai 274 Feb 11 00:47 app.ac
>>>>>> -rwxr-xr-x 1 tsakai tsakai 379 Feb 11 00:48 fib.R
>>>>>> bash-3.2#
>>>>>> bash-3.2# # now is the time for mpirun
>>>>>> bash-3.2#
>>>>>> bash-3.2# mpirun --app ./app.ac
>>>>>> 13 ip-10-100-243-195
>>>>>> 21 ip-10-100-243-195
>>>>>> 5 ip-10-195-198-31
>>>>>> 8 ip-10-195-198-31
>>>>>> bash-3.2#
>>>>>> bash-3.2# # It works (being root)!
>>>>>> bash-3.2#
>>>>>> bash-3.2# exit
>>>>>> exit
>>>>>> [tsakai@ip-10-195-198-31 ~]$
>>>>>> [tsakai@ip-10-195-198-31 ~]$ # try it one more time as tsakai
>>>>>> [tsakai@ip-10-195-198-31 ~]$
>>>>>> [tsakai@ip-10-195-198-31 ~]$ mpirun --app app.ac
>>>>>> 
>>>>>> --------------------------------------------------------------------------
>>>>>> mpirun was unable to launch the specified application as it encountered
>>>>>> an
>>>>>> error:
>>>>>> Error: pipe function call failed when setting up I/O forwarding subsystem
>>>>>> Node: ip-10-195-198-31
>>>>>> while attempting to start process rank 0.
>>>>>> 
>>>>>> --------------------------------------------------------------------------
>>>>>> [tsakai@ip-10-195-198-31 ~]$
>>>>>> [tsakai@ip-10-195-198-31 ~]$ # I don't get it.
>>>>>> [tsakai@ip-10-195-198-31 ~]$
>>>>>> [tsakai@ip-10-195-198-31 ~]$ exit
>>>>>> logout
>>>>>> [tsakai@vixen ec2]$
>>>>>> So, why does it say "pipe function call failed when setting up
>>>>>> I/O forwarding subsystem Node: ip-10-195-198-31" ?
>>>>>> The node it is referring to is not the remote machine.  It is
>>>>>> What I call machine A.  I first thought maybe this is a problem
>>>>>> With PATH variable.  But I don't think so.  I compared root's
>>>>>> Path to that of tsaki's and made them identical and retried.
>>>>>> I got the same behavior.
>>>>>> If you could enlighten me why this is happening, I would really
>>>>>> Appreciate it.
>>>>>> Thank you.
>>>>>> Tena
>>>>>> On 2/10/11 4:12 PM, "Tena Sakai" <tsa...@gallo.ucsf.edu> wrote:
>>>>>>> Hi jeff,
>>>>>>> 
>>>>>>> Thanks for the firewall tip.  I tried it while allowing all tip traffic
>>>>>>> and got interesting and preplexing result.  Here's what's interesting
>>>>>>> (BTW, I got rid of "LogLevel DEBUG3" from ./ssh/config on this run):
>>>>>>> 
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$ mpirun --app app.ac2
>>>>>>>  Host key verification failed.
>>>>>>> 
>>>>>>> 
> ------------------------------------------------------------------------->>>>>>
> -
>>>>>>>  A daemon (pid 2743) died unexpectedly with status 255 while attempting
>>>>>>>  to launch so we are aborting.
>>>>>>> 
>>>>>>>  There may be more information reported by the environment (see above).
>>>>>>> 
>>>>>>>  This may be because the daemon was unable to find all the needed shared
>>>>>>>  libraries on the remote node. You may set your LD_LIBRARY_PATH to have
>>>>>>> the
>>>>>>>  location of the shared libraries on the remote nodes and this will
>>>>>>>  automatically be forwarded to the remote nodes.
>>>>>>> 
>>>>>>> 
> ------------------------------------------------------------------------->>>>>>
> -
>>>>>>> 
>>>>>>> 
> ------------------------------------------------------------------------->>>>>>
> -
>>>>>>>  mpirun noticed that the job aborted, but has no info as to the process
>>>>>>>  that caused that situation.
>>>>>>> 
>>>>>>> 
> ------------------------------------------------------------------------->>>>>>
> -
>>>>>>>  mpirun: clean termination accomplished
>>>>>>> 
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$ env | grep LD_LIB
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$ # Let's set LD_LIBRARY_PATH to
>>>>>>> /usr/local/lib
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$ export LD_LIBRARY_PATH='/usr/local/lib'
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$ # I better to this on machine B as well
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$ ssh -i tsakai ip-10-195-171-159
>>>>>>>  Warning: Identity file tsakai not accessible: No such file or
>>>>>>> directory.
>>>>>>>  Last login: Thu Feb 10 18:31:20 2011 from 10.203.21.132
>>>>>>>  [tsakai@ip-10-195-171-159 ~]$
>>>>>>>  [tsakai@ip-10-195-171-159 ~]$ export LD_LIBRARY_PATH='/usr/local/lib'
>>>>>>>  [tsakai@ip-10-195-171-159 ~]$
>>>>>>>  [tsakai@ip-10-195-171-159 ~]$ env | grep LD_LIB
>>>>>>>  LD_LIBRARY_PATH=/usr/local/lib
>>>>>>>  [tsakai@ip-10-195-171-159 ~]$
>>>>>>>  [tsakai@ip-10-195-171-159 ~]$ # OK, now go bak to machine A
>>>>>>>  [tsakai@ip-10-195-171-159 ~]$ exit
>>>>>>>  logout
>>>>>>>  Connection to ip-10-195-171-159 closed.
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$ hostname
>>>>>>>  ip-10-203-21-132
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$ # try mpirun again
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$ mpirun --app app.ac2
>>>>>>>  Host key verification failed.
>>>>>>> 
>>>>>>> 
> ------------------------------------------------------------------------->>>>>>
> -
>>>>>>>  A daemon (pid 2789) died unexpectedly with status 255 while attempting
>>>>>>>  to launch so we are aborting.
>>>>>>> 
>>>>>>>  There may be more information reported by the environment (see above).
>>>>>>> 
>>>>>>>  This may be because the daemon was unable to find all the needed shared
>>>>>>>  libraries on the remote node. You may set your LD_LIBRARY_PATH to have
>>>>>>> the
>>>>>>>  location of the shared libraries on the remote nodes and this will
>>>>>>>  automatically be forwarded to the remote nodes.
>>>>>>> 
>>>>>>> 
> ------------------------------------------------------------------------->>>>>>
> -
>>>>>>> 
>>>>>>> 
> ------------------------------------------------------------------------->>>>>>
> -
>>>>>>>  mpirun noticed that the job aborted, but has no info as to the process
>>>>>>>  that caused that situation.
>>>>>>> 
>>>>>>> 
> ------------------------------------------------------------------------->>>>>>
> -
>>>>>>>  mpirun: clean termination accomplished
>>>>>>> 
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$ # I thought openmpi library was in
>>>>>>> /usr/local/lib...
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$ ll -t /usr/local/lib | less
>>>>>>>  total 16604
>>>>>>>  lrwxrwxrwx 1 root root      16 Feb  8 23:06 libfuse.so ->
>>>>>>> libfuse.so.2.8.5
>>>>>>>  lrwxrwxrwx 1 root root      16 Feb  8 23:06 libfuse.so.2 ->
>>>>>>> libfuse.so.2.8.5
>>>>>>>  lrwxrwxrwx 1 root root      25 Feb  8 23:06 libmca_common_sm.so ->
>>>>>>> libmca_common_sm.so.1.0.0
>>>>>>>  lrwxrwxrwx 1 root root      25 Feb  8 23:06 libmca_common_sm.so.1 ->
>>>>>>> libmca_common_sm.so.1.0.0
>>>>>>>  lrwxrwxrwx 1 root root      15 Feb  8 23:06 libmpi.so ->
>>>>>>> libmpi.so.0.0.2
>>>>>>>  lrwxrwxrwx 1 root root      15 Feb  8 23:06 libmpi.so.0 ->
>>>>>>> libmpi.so.0.0.2
>>>>>>>  lrwxrwxrwx 1 root root      19 Feb  8 23:06 libmpi_cxx.so ->
>>>>>>> libmpi_cxx.so.0.0.1
>>>>>>>  lrwxrwxrwx 1 root root      19 Feb  8 23:06 libmpi_cxx.so.0 ->
>>>>>>> libmpi_cxx.so.0.0.1
>>>>>>>  lrwxrwxrwx 1 root root      19 Feb  8 23:06 libmpi_f77.so ->
>>>>>>> libmpi_f77.so.0.0.1
>>>>>>>  lrwxrwxrwx 1 root root      19 Feb  8 23:06 libmpi_f77.so.0 ->
>>>>>>> libmpi_f77.so.0.0.1
>>>>>>>  lrwxrwxrwx 1 root root      19 Feb  8 23:06 libmpi_f90.so ->
>>>>>>> libmpi_f90.so.0.0.1
>>>>>>>  lrwxrwxrwx 1 root root      19 Feb  8 23:06 libmpi_f90.so.0 ->
>>>>>>> libmpi_f90.so.0.0.1
>>>>>>>  lrwxrwxrwx 1 root root      20 Feb  8 23:06 libopen-pal.so ->
>>>>>>> libopen-pal.so.0.0.0
>>>>>>>  lrwxrwxrwx 1 root root      20 Feb  8 23:06 libopen-pal.so.0 ->
>>>>>>> libopen-pal.so.0.0.0
>>>>>>>  lrwxrwxrwx 1 root root      20 Feb  8 23:06 libopen-rte.so ->
>>>>>>> libopen-rte.so.0.0.0
>>>>>>>  lrwxrwxrwx 1 root root      20 Feb  8 23:06 libopen-rte.so.0 ->
>>>>>>> libopen-rte.so.0.0.0
>>>>>>>  lrwxrwxrwx 1 root root      26 Feb  8 23:06 libopenmpi_malloc.so ->
>>>>>>> libopenmpi_malloc.so.0.0.0
>>>>>>>  lrwxrwxrwx 1 root root      26 Feb  8 23:06 libopenmpi_malloc.so.0 ->
>>>>>>> libopenmpi_malloc.so.0.0.0
>>>>>>>  lrwxrwxrwx 1 root root      20 Feb  8 23:06 libulockmgr.so ->
>>>>>>> libulockmgr.so.1.0.1
>>>>>>>  lrwxrwxrwx 1 root root      20 Feb  8 23:06 libulockmgr.so.1 ->
>>>>>>> libulockmgr.so.1.0.1
>>>>>>>  lrwxrwxrwx 1 root root      16 Feb  8 23:06 libxml2.so ->
>>>>>>> libxml2.so.2.7.2
>>>>>>>  lrwxrwxrwx 1 root root      16 Feb  8 23:06 libxml2.so.2 ->
>>>>>>> libxml2.so.2.7.2
>>>>>>>  -rw-r--r-- 1 root root  385912 Jan 26 01:00 libvt.a
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$ # Now, I am really confused...
>>>>>>>  [tsakai@ip-10-203-21-132 ~]$
>>>>>>> 
>>>>>>> Do you know why it's complaining about shared libraries?
>>>>>>> 
>>>>>>> Thank you.
>>>>>>> 
>>>>>>> Tena
>>>>>>> 
>>>>>>> 
>>>>>>> On 2/10/11 1:05 PM, "Jeff Squyres" <jsquy...@cisco.com> wrote:
>>>>>>> 
>>>>>>>> Your prior mails were about ssh issues, but this one sounds like you
>>>>>>>> might
>>>>>>>> have firewall issues.
>>>>>>>> 
>>>>>>>> That is, the "orted" command attempts to open a TCP socket back to
>>>>>>>> mpirun
>>>>>>>> for
>>>>>>>> various command and control reasons.  If it is blocked from doing so by
>>>>>>>> a
>>>>>>>> firewall, Open MPI won't run.  In general, you can either disable your
>>>>>>>> firewall or you can setup a trust relationship for TCP connections
>>>>>>>> within
>>>>>>>> your
>>>>>>>> cluster.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Feb 10, 2011, at 1:03 PM, Tena Sakai wrote:
>>>>>>>> 
>>>>>>>>> Hi Reuti,
>>>>>>>>> 
>>>>>>>>> Thanks for suggesting "LogLevel DEBUG3."  I did so and complete
>>>>>>>>> session is captured in the attached file.
>>>>>>>>> 
>>>>>>>>> What I did is much similar to what I have done before: verify
>>>>>>>>> that ssh works and then run mpirun command.  In my a bit lengthy
>>>>>>>>> session log, there are two responses from "LogLevel DEBUG3."  First
>>>>>>>>> from an scp invocation and then from mpirun invocation.  They both
>>>>>>>>> say
>>>>>>>>>  debug1: Authentication succeeded (publickey).
>>>>>>>>> 
>>>>>>>>>> From mpirun invocation, I see a line:
>>>>>>>>>  debug1: Sending command:  orted --daemonize -mca ess env -mca
>>>>>>>>> orte_ess_jobid 3344891904 -mca orte_ess_vpid 1 -mca orte_ess_num_procs
>>>>>>>>>  2 --hnp-uri "3344891904.0;tcp://10.194.95.239:54256"
>>>>>>>>> The IP address at the end of the line is indeed that of machine B.
>>>>>>>>> After that there was hanging and I controlled-C out of it, which
>>>>>>>>> gave me more lines.  But the lines after
>>>>>>>>>  debug1: Sending command:  orted bla bla bla
>>>>>>>>> doesn't look good to me.  But, in truth, I have no idea what they
>>>>>>>>> mean.
>>>>>>>>> 
>>>>>>>>> If you could shed some light, I would appreciate it very much.
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> 
>>>>>>>>> Tena
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 2/10/11 10:57 AM, "Reuti" <re...@staff.uni-marburg.de> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> Am 10.02.2011 um 19:11 schrieb Tena Sakai:
>>>>>>>>>> 
>>>>>>>>>>>> your local machine is Linux like, but the execution hosts
>>>>>>>>>>>> are Macs? I saw the /Users/tsakai/... in your output.
>>>>>>>>>>> No, my environment is entirely linux.  The path to my home
>>>>>>>>>>> directory on one host (blitzen) has been known as /Users/tsakai,
>>>>>>>>>>> despite it is an nfs mount from vixen (which is known to
>>>>>>>>>>> itself as /home/tsakai).  For historical reasons, I have
>>>>>>>>>>> chosen to give a symbolic link named /Users to vixen's /Home,
>>>>>>>>>>> so that I can use consistent path for both vixen and blitzen.
>>>>>>>>>> okay. Sometimes the protection of the home directory must be adjusted
>>>>>>>>>> too,
>>>>>>>>>> but
>>>>>>>>>> as you can do it from the command line this shouldn't be an issue.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>>> Is this a private cluster (or at least private interfaces)?
>>>>>>>>>>>> It would also be an option to use hostbased authentication,
>>>>>>>>>>>> which will avoid setting any known_hosts file or passphraseless
>>>>>>>>>>>> ssh-keys for each user.
>>>>>>>>>>> No, it is not a private cluster.  It is Amazon EC2.  When I
>>>>>>>>>>> Ssh from my local machine (vixen) I use its public interface,
>>>>>>>>>>> but to address from one amazon cluster node to the other I
>>>>>>>>>>> use nodes' private dns names: domU-12-31-39-07-35-21 and
>>>>>>>>>>> domU-12-31-39-06-74-E2.  Both public and private dns names
>>>>>>>>>>> change from a launch to another.  I am using passphrasesless
>>>>>>>>>>> ssh-keys for authentication in all cases, i.e., from vixen to
>>>>>>>>>>> Amazon node A, from amazon node A to amazon node B, and from
>>>>>>>>>>> Amazon node B back to A.  (Please see my initail post.  There
>>>>>>>>>>> is a session dialogue for this.)  They all work without authen-
>>>>>>>>>>> tication dialogue, except a brief initial dialogue:
>>>>>>>>>>> The authenticity of host 'domu-xx-xx-xx-xx-xx-x (10.xx.xx.xx)'
>>>>>>>>>>> can't be established.
>>>>>>>>>>>  RSA key fingerprint is
>>>>>>>>>>> e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81.
>>>>>>>>>>>  Are you sure you want to continue connecting (yes/no)?
>>>>>>>>>>> to which I say "yes."
>>>>>>>>>>> But I am unclear with what you mean by "hostbased authentication"?
>>>>>>>>>>> Doesn't that mean with password?  If so, it is not an option.
>>>>>>>>>> No. It's convenient inside a private cluster as it won't fill each
>>>>>>>>>> users'
>>>>>>>>>> known_hosts file and you don't need to create any ssh-keys. But when
>>>>>>>>>> the
>>>>>>>>>> hostname changes every time it might also create new hostkeys. It 
>>>>>>>>>> uses
>>>>>>>>>> hostkeys (private and public), this way it works for all users. Just
>>>>>>>>>> for
>>>>>>>>>> reference:
>>>>>>>>>> 
>>>>>>>>>> http://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html
>>>>>>>>>> 
>>>>>>>>>> You could look into it later.
>>>>>>>>>> 
>>>>>>>>>> ==
>>>>>>>>>> 
>>>>>>>>>> - Can you try to use a command when connecting from A to B? E.g. ssh
>>>>>>>>>> `domU-12-31-39-06-74-E2 ls`. Is this working too?
>>>>>>>>>> 
>>>>>>>>>> - What about putting:
>>>>>>>>>> 
>>>>>>>>>> LogLevel DEBUG3
>>>>>>>>>> 
>>>>>>>>>> In your ~/.ssh/config. Maybe we can see what he's trying to negotiate
>>>>>>>>>> before
>>>>>>>>>> it fails in verbose mode.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -- Reuti
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> Regards,
>>>>>>>>>>> 
>>>>>>>>>>> Tena
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 2/10/11 2:27 AM, "Reuti" <re...@staff.uni-marburg.de> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> your local machine is Linux like, but the execution hosts are Macs?
>>>>>>>>>>>> I
>>>>>>>>>>>> saw
>>>>>>>>>>>> the
>>>>>>>>>>>> /Users/tsakai/... in your output.
>>>>>>>>>>>> 
>>>>>>>>>>>> a) executing a command on them is also working, e.g.: ssh
>>>>>>>>>>>> domU-12-31-39-07-35-21 ls
>>>>>>>>>>>> 
>>>>>>>>>>>> Am 10.02.2011 um 07:08 schrieb Tena Sakai:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I have made a bit of progress(?)...
>>>>>>>>>>>>> I made a config file in my .ssh directory on the cloud.  It looks
>>>>>>>>>>>>> like:
>>>>>>>>>>>>> # machine A
>>>>>>>>>>>>> Host domU-12-31-39-07-35-21.compute-1.internal
>>>>>>>>>>>> This is just an abbreviation or nickname above. To use the 
>>>>>>>>>>>> specified
>>>>>>>>>>>> settings,
>>>>>>>>>>>> it's necessary to specify exactly this name. When the settings are
>>>>>>>>>>>> the
>>>>>>>>>>>> same
>>>>>>>>>>>> anyway for all machines, you can use:
>>>>>>>>>>>> 
>>>>>>>>>>>> Host *
>>>>>>>>>>>> IdentityFile /home/tsakai/.ssh/tsakai
>>>>>>>>>>>> IdentitiesOnly yes
>>>>>>>>>>>> BatchMode yes
>>>>>>>>>>>> 
>>>>>>>>>>>> instead.
>>>>>>>>>>>> 
>>>>>>>>>>>> Is this a private cluster (or at least private interfaces)? It 
>>>>>>>>>>>> would
>>>>>>>>>>>> also
>>>>>>>>>>>> be
>>>>>>>>>>>> an option to use hostbased authentication, which will avoid setting
>>>>>>>>>>>> any
>>>>>>>>>>>> known_hosts file or passphraseless ssh-keys for each user.
>>>>>>>>>>>> 
>>>>>>>>>>>> -- Reuti
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> HostName domU-12-31-39-07-35-21
>>>>>>>>>>>>> BatchMode yes
>>>>>>>>>>>>> IdentityFile /home/tsakai/.ssh/tsakai
>>>>>>>>>>>>> ChallengeResponseAuthentication no
>>>>>>>>>>>>> IdentitiesOnly yes
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # machine B
>>>>>>>>>>>>> Host domU-12-31-39-06-74-E2.compute-1.internal
>>>>>>>>>>>>> HostName domU-12-31-39-06-74-E2
>>>>>>>>>>>>> BatchMode yes
>>>>>>>>>>>>> IdentityFile /home/tsakai/.ssh/tsakai
>>>>>>>>>>>>> ChallengeResponseAuthentication no
>>>>>>>>>>>>> IdentitiesOnly yes
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This file exists on both machine A and machine B.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Now When I issue mpirun command as below:
>>>>>>>>>>>>> [tsakai@domU-12-31-39-06-74-E2 ~]$ mpirun -app app.ac2
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It hungs.  I control-C out of it and I get:
>>>>>>>>>>>>> mpirun: killing job...
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>> ------------------------------------------------------------------------->
>>>>>>>> 
>>>>>>> -
>>>>>>>>>>>>> mpirun noticed that the job aborted, but has no info as to the
>>>>>>>>>>>>> process
>>>>>>>>>>>>> that caused that situation.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>> ------------------------------------------------------------------------->
>>>>>>>> 
>>>>>>> -
>>>>>> ------------------------------------------------------------------------->
>>>>>>>> 
>>>>>>> -
>>>>>>>>>>>>> mpirun was unable to cleanly terminate the daemons on the nodes
>>>>>>>>>>>>> shown
>>>>>>>>>>>>> below. Additional manual cleanup may be required - please refer to
>>>>>>>>>>>>> the "orte-clean" tool for assistance.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>> ------------------------------------------------------------------------->
>>>>>>>> 
>>>>>>> -
>>>>>>>>>>>>>     domU-12-31-39-07-35-21.compute-1.internal - daemon did not
>>>>>>>>>>>>> report
>>>>>>>>>>>>> back when launched
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Am I making progress?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Does this mean I am past authentication and something else is the
>>>>>>>>>>>>> problem?
>>>>>>>>>>>>> Does someone have an example .ssh/config file I can look at?  
>>>>>>>>>>>>> There
>>>>>>>>>>>>> are
>>>>>>>>>>>>> so
>>>>>>>>>>>>> many keyword-argument paris for this config file and I would like
>>>>>>>>>>>>> to
>>>>>>>>>>>>> look
>>>>>>>>>>>>> at
>>>>>>>>>>>>> some very basic one that works.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thank you.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Tena Sakai
>>>>>>>>>>>>> tsa...@gallo.ucsf.edu
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 2/9/11 7:52 PM, "Tena Sakai" <tsa...@gallo.ucsf.edu> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I have an app.ac1 file like below:
>>>>>>>>>>>>> [tsakai@vixen local]$ cat app.ac1
>>>>>>>>>>>>> -H vixen.egcrc.org   -np 1 Rscript
>>>>>>>>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 5
>>>>>>>>>>>>> -H vixen.egcrc.org   -np 1 Rscript
>>>>>>>>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 6
>>>>>>>>>>>>> -H blitzen.egcrc.org -np 1 Rscript
>>>>>>>>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 7
>>>>>>>>>>>>> -H blitzen.egcrc.org -np 1 Rscript
>>>>>>>>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 8
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The program I run is
>>>>>>>>>>>>> Rscript /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R x
>>>>>>>>>>>>> Where x is [5..8].  The machines vixen and blitzen each run 2 
>>>>>>>>>>>>> runs.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Here’s the program fib.R:
>>>>>>>>>>>>> [ tsakai@vixen local]$ cat fib.R
>>>>>>>>>>>>>     # fib() computes, given index n, fibonacci number iteratively
>>>>>>>>>>>>>     # here's the first dozen sequence (indexed from 0..11)
>>>>>>>>>>>>>     # 1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89
>>>>>>>>>>>>> 
>>>>>>>>>>>>> fib <- function( n ) {
>>>>>>>>>>>>>         a <- 0
>>>>>>>>>>>>>         b <- 1
>>>>>>>>>>>>>         for ( i in 1:n ) {
>>>>>>>>>>>>>              t <- b
>>>>>>>>>>>>>              b <- a
>>>>>>>>>>>>>              a <- a + t
>>>>>>>>>>>>>         }
>>>>>>>>>>>>>     a
>>>>>>>>>>>>> 
>>>>>>>>>>>>> arg <- commandArgs( TRUE )
>>>>>>>>>>>>> myHost <- system( 'hostname', intern=TRUE )
>>>>>>>>>>>>> cat( fib(arg), myHost, '\n' )
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It reads an argument from command line and produces a fibonacci
>>>>>>>>>>>>> number
>>>>>>>>>>>>> that
>>>>>>>>>>>>> corresponds to that index, followed by the machine name.  Pretty
>>>>>>>>>>>>> simple
>>>>>>>>>>>>> stuff.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Here’s the run output:
>>>>>>>>>>>>> [tsakai@vixen local]$ mpirun -app app.ac1
>>>>>>>>>>>>> 5 vixen.egcrc.org
>>>>>>>>>>>>> 8 vixen.egcrc.org
>>>>>>>>>>>>> 13 blitzen.egcrc.org
>>>>>>>>>>>>> 21 blitzen.egcrc.org
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Which is exactly what I expect.  So far so good.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Now I want to run the same thing on cloud.  I launch 2 instances 
>>>>>>>>>>>>> of
>>>>>>>>>>>>> the
>>>>>>>>>>>>> same
>>>>>>>>>>>>> virtual machine, to which I get to by:
>>>>>>>>>>>>> [tsakai@vixen local]$ ssh –A I ~/.ssh/tsakai
>>>>>>>>>>>>> machine-instance-A-public-dns
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Now I am on machine A:
>>>>>>>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$
>>>>>>>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ # and I can go to machine B
>>>>>>>>>>>>> without
>>>>>>>>>>>>> password authentication,
>>>>>>>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ # i.e., use public/private key
>>>>>>>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$
>>>>>>>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname
>>>>>>>>>>>>> domU-12-31-39-00-D1-F2
>>>>>>>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ ssh -i .ssh/tsakai
>>>>>>>>>>>>> domU-12-31-39-0C-C8-01
>>>>>>>>>>>>> Last login: Wed Feb  9 20:51:48 2011 from 10.254.214.4
>>>>>>>>>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$
>>>>>>>>>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ # I am now on machine B
>>>>>>>>>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ hostname
>>>>>>>>>>>>> domU-12-31-39-0C-C8-01
>>>>>>>>>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$
>>>>>>>>>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ # now show I can get to machine
>>>>>>>>>>>>> A
>>>>>>>>>>>>> without using password
>>>>>>>>>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$
>>>>>>>>>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ ssh -i .ssh/tsakai
>>>>>>>>>>>>> domU-12-31-39-00-D1-F2
>>>>>>>>>>>>> The authenticity of host 'domu-12-31-39-00-d1-f2 (10.254.214.4)'
>>>>>>>>>>>>> can't
>>>>>>>>>>>>> be established.
>>>>>>>>>>>>> RSA key fingerprint is
>>>>>>>>>>>>> e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81.
>>>>>>>>>>>>> Are you sure you want to continue connecting (yes/no)? yes
>>>>>>>>>>>>> Warning: Permanently added 'domu-12-31-39-00-d1-f2' (RSA) to the
>>>>>>>>>>>>> list
>>>>>>>>>>>>> of
>>>>>>>>>>>>> known hosts.
>>>>>>>>>>>>> Last login: Wed Feb  9 20:49:34 2011 from 10.215.203.239
>>>>>>>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$
>>>>>>>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname
>>>>>>>>>>>>> domU-12-31-39-00-D1-F2
>>>>>>>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$
>>>>>>>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ exit
>>>>>>>>>>>>> logout
>>>>>>>>>>>>> Connection to domU-12-31-39-00-D1-F2 closed.
>>>>>>>>>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$
>>>>>>>>>>>>> [tsakai@domU-12-31-39-0C-C8-01 ~]$ exit
>>>>>>>>>>>>> logout
>>>>>>>>>>>>> Connection to domU-12-31-39-0C-C8-01 closed.
>>>>>>>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$
>>>>>>>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ # back at machine A
>>>>>>>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ hostname
>>>>>>>>>>>>> domU-12-31-39-00-D1-F2
>>>>>>>>>>>>> 
>>>>>>>>>>>>> As you can see, neither machine uses password for authentication;
>>>>>>>>>>>>> it
>>>>>>>>>>>>> uses
>>>>>>>>>>>>> public/private key pairs.  There is no problem (that I can see) 
>>>>>>>>>>>>> for
>>>>>>>>>>>>> ssh
>>>>>>>>>>>>> invocation
>>>>>>>>>>>>> from one machine to the other.  This is so because I have a copy 
>>>>>>>>>>>>> of
>>>>>>>>>>>>> public
>>>>>>>>>>>>> key
>>>>>>>>>>>>> and a copy of private key on each instance.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The app.ac file is identical, except the node names:
>>>>>>>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ cat app.ac1
>>>>>>>>>>>>> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 5
>>>>>>>>>>>>> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 6
>>>>>>>>>>>>> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 7
>>>>>>>>>>>>> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 8
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Here’s what happens with mpirun:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$ mpirun -app app.ac1
>>>>>>>>>>>>> tsakai@domu-12-31-39-0c-c8-01's password:
>>>>>>>>>>>>> Permission denied, please try again.
>>>>>>>>>>>>> tsakai@domu-12-31-39-0c-c8-01's password: mpirun: killing job...
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>> ----------------------------------------------------------------------->>>
>>>>>>>> 
>>>>>> -
>>>>>>>>>>>>> --
>>>>>>>>>>>>> mpirun noticed that the job aborted, but has no info as to the
>>>>>>>>>>>>> process
>>>>>>>>>>>>> that caused that situation.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>> ----------------------------------------------------------------------->>>
>>>>>>>> 
>>>>>> -
>>>>>>>>>>>>> --
>>>>>>>>>>>>> 
>>>>>>>>>>>>> mpirun: clean termination accomplished
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [tsakai@domU-12-31-39-00-D1-F2 ~]$
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Mpirun (or somebody else?) asks me password, which I don’t have.
>>>>>>>>>>>>> I end up typing control-C.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Here’s my question:
>>>>>>>>>>>>> How can I get past authentication by mpirun where there is no
>>>>>>>>>>>>> password?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I would appreciate your help/insight greatly.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thank you.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Tena Sakai
>>>>>>>>>>>>> tsa...@gallo.ucsf.edu
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to