Thanks everyone for all your assistance. The problem seems to be resolved
now, although I'm not entirely sure why these changes made a difference.
There were two things I changed:

(1) I had some additional `export ...` lines in .bashrc before the `export
PATH=...` and `export LD_LIBRARY_PATH=...` lines. When I removed those
lines (and then later added them back in below the PATH and LD_LIBRARY_PATH
lines) mpirun worked. But only b09-30 was able to execute code on b09-32
and not the other way around.

(2) I passed IP addresses to mpirun instead of the hostnames (this didn't
work previously), and now mpirun works in both directions (b09-30 -> b09-32
and b09-32 -> b09-30). I added a 3rd host in the rack and mpirun still
works when passing IP addresses. For some reason using the host name
doesn't work despite the fact that I can use it to ssh.

Also FWIW I wasn't using a debugger.

Thanks again,
Max


On Mon, May 14, 2018 at 4:39 PM, Gilles Gouaillardet <gil...@rist.or.jp>
wrote:

> In the initial report, the /usr/bin/ssh process was in the 'T' state
> (it generally hints the process is attached by a debugger)
>
> /usr/bin/ssh -x b09-32 orted
>
> did behave as expected (e.g. orted was executed, exited with an error
> since the command line is invalid, and error message was received)
>
>
> can you try to run
>
> /home/user/openmpi_install/bin/mpirun --host b09-30,b09-32 hostname
>
> and see how things go ? (since you simply 'ssh orted', an other orted
> might be used)
>
> If you are still facing the same hang with ssh in the 'T' state, can you
> check the logs on b09-32 and see
> if the sshd server was even contacted ? I can hardly make sense of this
> error fwiw.
>
>
> Cheers,
>
> Gilles
>
> On 5/15/2018 5:27 AM, r...@open-mpi.org wrote:
>
>> You got that error because the orted is looking for its rank on the cmd
>> line and not finding it.
>>
>>
>> On May 14, 2018, at 12:37 PM, Max Mellette <wmell...@ucsd.edu <mailto:
>>> wmell...@ucsd.edu>> wrote:
>>>
>>> Hi Gus,
>>>
>>> Thanks for the suggestions. The correct version of openmpi seems to be
>>> getting picked up; I also prepended .bashrc with the installation path like
>>> you suggested, but it didn't seemed to help:
>>>
>>> user@b09-30:~$ cat .bashrc
>>> export PATH=/home/user/openmpi_install/bin:/usr/local/sbin:/usr/
>>> local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/
>>> local/games:/snap/bin
>>> export LD_LIBRARY_PATH=/home/user/openmpi_install/lib
>>> user@b09-30:~$ which mpicc
>>> /home/user/openmpi_install/bin/mpicc
>>> user@b09-30:~$ /usr/bin/ssh -x b09-32 orted
>>> [b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
>>> ess_env_module.c at line 147
>>> [b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in
>>> file util/session_dir.c at line 106
>>> [b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in
>>> file util/session_dir.c at line 345
>>> [b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in
>>> file base/ess_base_std_orted.c at line 270
>>> ------------------------------------------------------------
>>> --------------
>>> It looks like orte_init failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during orte_init; some of which are due to configuration or
>>> environment problems.  This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>>
>>>   orte_session_dir failed
>>>   --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
>>> ------------------------------------------------------------
>>> --------------
>>>
>>> Thanks,
>>> Max
>>>
>>>
>>> On Mon, May 14, 2018 at 11:41 AM, Gus Correa <g...@ldeo.columbia.edu
>>> <mailto:g...@ldeo.columbia.edu>> wrote:
>>>
>>>     Hi Max
>>>
>>>     Just in case, as environment mix often happens.
>>>     Could it be that you are inadvertently picking another
>>>     installation of OpenMPI, perhaps installed from packages
>>>     in /usr , or /usr/local?
>>>     That's easy to check with 'which mpiexec' or
>>>     'which mpicc', for instance.
>>>
>>>     Have you tried to prepend (as opposed to append) OpenMPI
>>>     to your PATH? Say:
>>>
>>>     export
>>>     PATH='/home/user/openmpi_install/bin:/usr/local/sbin:/usr/
>>> local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/
>>> local/games:/snap/bin'
>>>
>>>     I hope this helps,
>>>     Gus Correa
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to