Hi Max Name resolution in /etc/hosts is a simple solution for (2).
I hope this helps, Gus > On May 15, 2018, at 01:39, Max Mellette <wmell...@ucsd.edu> wrote: > > Thanks everyone for all your assistance. The problem seems to be resolved > now, although I'm not entirely sure why these changes made a difference. > There were two things I changed: > > (1) I had some additional `export ...` lines in .bashrc before the `export > PATH=...` and `export LD_LIBRARY_PATH=...` lines. When I removed those lines > (and then later added them back in below the PATH and LD_LIBRARY_PATH lines) > mpirun worked. But only b09-30 was able to execute code on b09-32 and not the > other way around. > > (2) I passed IP addresses to mpirun instead of the hostnames (this didn't > work previously), and now mpirun works in both directions (b09-30 -> b09-32 > and b09-32 -> b09-30). I added a 3rd host in the rack and mpirun still works > when passing IP addresses. For some reason using the host name doesn't work > despite the fact that I can use it to ssh. > > Also FWIW I wasn't using a debugger. > > Thanks again, > Max > > > On Mon, May 14, 2018 at 4:39 PM, Gilles Gouaillardet <gil...@rist.or.jp> > wrote: > In the initial report, the /usr/bin/ssh process was in the 'T' state > (it generally hints the process is attached by a debugger) > > /usr/bin/ssh -x b09-32 orted > > did behave as expected (e.g. orted was executed, exited with an error since > the command line is invalid, and error message was received) > > > can you try to run > > /home/user/openmpi_install/bin/mpirun --host b09-30,b09-32 hostname > > and see how things go ? (since you simply 'ssh orted', an other orted might > be used) > > If you are still facing the same hang with ssh in the 'T' state, can you > check the logs on b09-32 and see > if the sshd server was even contacted ? I can hardly make sense of this error > fwiw. > > > Cheers, > > Gilles > > On 5/15/2018 5:27 AM, r...@open-mpi.org wrote: > You got that error because the orted is looking for its rank on the cmd line > and not finding it. > > > On May 14, 2018, at 12:37 PM, Max Mellette <wmell...@ucsd.edu > <mailto:wmell...@ucsd.edu>> wrote: > > Hi Gus, > > Thanks for the suggestions. The correct version of openmpi seems to be > getting picked up; I also prepended .bashrc with the installation path like > you suggested, but it didn't seemed to help: > > user@b09-30:~$ cat .bashrc > export > PATH=/home/user/openmpi_install/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin > export LD_LIBRARY_PATH=/home/user/openmpi_install/lib > user@b09-30:~$ which mpicc > /home/user/openmpi_install/bin/mpicc > user@b09-30:~$ /usr/bin/ssh -x b09-32 orted > [b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file > ess_env_module.c at line 147 > [b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file > util/session_dir.c at line 106 > [b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file > util/session_dir.c at line 345 > [b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file > base/ess_base_std_orted.c at line 270 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_session_dir failed > --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > > Thanks, > Max > > > On Mon, May 14, 2018 at 11:41 AM, Gus Correa <g...@ldeo.columbia.edu > <mailto:g...@ldeo.columbia.edu>> wrote: > > Hi Max > > Just in case, as environment mix often happens. > Could it be that you are inadvertently picking another > installation of OpenMPI, perhaps installed from packages > in /usr , or /usr/local? > That's easy to check with 'which mpiexec' or > 'which mpicc', for instance. > > Have you tried to prepend (as opposed to append) OpenMPI > to your PATH? Say: > > export > > PATH='/home/user/openmpi_install/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin' > > I hope this helps, > Gus Correa > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > https://lists.open-mpi.org/mailman/listinfo/users > > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users