Hi Max

Name resolution in /etc/hosts is a simple solution for (2).

I hope this helps,
Gus

> On May 15, 2018, at 01:39, Max Mellette <wmell...@ucsd.edu> wrote:
> 
> Thanks everyone for all your assistance. The problem seems to be resolved 
> now, although I'm not entirely sure why these changes made a difference. 
> There were two things I changed:
> 
> (1) I had some additional `export ...` lines in .bashrc before the `export 
> PATH=...` and `export LD_LIBRARY_PATH=...` lines. When I removed those lines 
> (and then later added them back in below the PATH and LD_LIBRARY_PATH lines) 
> mpirun worked. But only b09-30 was able to execute code on b09-32 and not the 
> other way around.
> 
> (2) I passed IP addresses to mpirun instead of the hostnames (this didn't 
> work previously), and now mpirun works in both directions (b09-30 -> b09-32 
> and b09-32 -> b09-30). I added a 3rd host in the rack and mpirun still works 
> when passing IP addresses. For some reason using the host name doesn't work 
> despite the fact that I can use it to ssh.
> 
> Also FWIW I wasn't using a debugger.
> 
> Thanks again,
> Max
> 
> 
> On Mon, May 14, 2018 at 4:39 PM, Gilles Gouaillardet <gil...@rist.or.jp> 
> wrote:
> In the initial report, the /usr/bin/ssh process was in the 'T' state
> (it generally hints the process is attached by a debugger)
> 
> /usr/bin/ssh -x b09-32 orted
> 
> did behave as expected (e.g. orted was executed, exited with an error since 
> the command line is invalid, and error message was received)
> 
> 
> can you try to run
> 
> /home/user/openmpi_install/bin/mpirun --host b09-30,b09-32 hostname
> 
> and see how things go ? (since you simply 'ssh orted', an other orted might 
> be used)
> 
> If you are still facing the same hang with ssh in the 'T' state, can you 
> check the logs on b09-32 and see
> if the sshd server was even contacted ? I can hardly make sense of this error 
> fwiw.
> 
> 
> Cheers,
> 
> Gilles
> 
> On 5/15/2018 5:27 AM, r...@open-mpi.org wrote:
> You got that error because the orted is looking for its rank on the cmd line 
> and not finding it.
> 
> 
> On May 14, 2018, at 12:37 PM, Max Mellette <wmell...@ucsd.edu 
> <mailto:wmell...@ucsd.edu>> wrote:
> 
> Hi Gus,
> 
> Thanks for the suggestions. The correct version of openmpi seems to be 
> getting picked up; I also prepended .bashrc with the installation path like 
> you suggested, but it didn't seemed to help:
> 
> user@b09-30:~$ cat .bashrc
> export 
> PATH=/home/user/openmpi_install/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
> export LD_LIBRARY_PATH=/home/user/openmpi_install/lib
> user@b09-30:~$ which mpicc
> /home/user/openmpi_install/bin/mpicc
> user@b09-30:~$ /usr/bin/ssh -x b09-32 orted
> [b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file 
> ess_env_module.c at line 147
> [b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
> util/session_dir.c at line 106
> [b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
> util/session_dir.c at line 345
> [b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
> base/ess_base_std_orted.c at line 270
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>   orte_session_dir failed
>   --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> 
> Thanks,
> Max
> 
> 
> On Mon, May 14, 2018 at 11:41 AM, Gus Correa <g...@ldeo.columbia.edu 
> <mailto:g...@ldeo.columbia.edu>> wrote:
> 
>     Hi Max
> 
>     Just in case, as environment mix often happens.
>     Could it be that you are inadvertently picking another
>     installation of OpenMPI, perhaps installed from packages
>     in /usr , or /usr/local?
>     That's easy to check with 'which mpiexec' or
>     'which mpicc', for instance.
> 
>     Have you tried to prepend (as opposed to append) OpenMPI
>     to your PATH? Say:
> 
>     export
>     
> PATH='/home/user/openmpi_install/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin'
> 
>     I hope this helps,
>     Gus Correa
> 
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> https://lists.open-mpi.org/mailman/listinfo/users
> 
> 
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to