Thanks everyone for all your assistance. The problem seems to be resolved now, although I'm not entirely sure why these changes made a difference. There were two things I changed:
(1) I had some additional `export ...` lines in .bashrc before the `export PATH=...` and `export LD_LIBRARY_PATH=...` lines. When I removed those lines (and then later added them back in below the PATH and LD_LIBRARY_PATH lines) mpirun worked. But only b09-30 was able to execute code on b09-32 and not the other way around. (2) I passed IP addresses to mpirun instead of the hostnames (this didn't work previously), and now mpirun works in both directions (b09-30 -> b09-32 and b09-32 -> b09-30). I added a 3rd host in the rack and mpirun still works when passing IP addresses. For some reason using the host name doesn't work despite the fact that I can use it to ssh. Also FWIW I wasn't using a debugger. Thanks again, Max On Mon, May 14, 2018 at 4:39 PM, Gilles Gouaillardet <gil...@rist.or.jp> wrote: > In the initial report, the /usr/bin/ssh process was in the 'T' state > (it generally hints the process is attached by a debugger) > > /usr/bin/ssh -x b09-32 orted > > did behave as expected (e.g. orted was executed, exited with an error > since the command line is invalid, and error message was received) > > > can you try to run > > /home/user/openmpi_install/bin/mpirun --host b09-30,b09-32 hostname > > and see how things go ? (since you simply 'ssh orted', an other orted > might be used) > > If you are still facing the same hang with ssh in the 'T' state, can you > check the logs on b09-32 and see > if the sshd server was even contacted ? I can hardly make sense of this > error fwiw. > > > Cheers, > > Gilles > > On 5/15/2018 5:27 AM, r...@open-mpi.org wrote: > >> You got that error because the orted is looking for its rank on the cmd >> line and not finding it. >> >> >> On May 14, 2018, at 12:37 PM, Max Mellette <wmell...@ucsd.edu <mailto: >>> wmell...@ucsd.edu>> wrote: >>> >>> Hi Gus, >>> >>> Thanks for the suggestions. The correct version of openmpi seems to be >>> getting picked up; I also prepended .bashrc with the installation path like >>> you suggested, but it didn't seemed to help: >>> >>> user@b09-30:~$ cat .bashrc >>> export PATH=/home/user/openmpi_install/bin:/usr/local/sbin:/usr/ >>> local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/ >>> local/games:/snap/bin >>> export LD_LIBRARY_PATH=/home/user/openmpi_install/lib >>> user@b09-30:~$ which mpicc >>> /home/user/openmpi_install/bin/mpicc >>> user@b09-30:~$ /usr/bin/ssh -x b09-32 orted >>> [b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file >>> ess_env_module.c at line 147 >>> [b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in >>> file util/session_dir.c at line 106 >>> [b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in >>> file util/session_dir.c at line 345 >>> [b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in >>> file base/ess_base_std_orted.c at line 270 >>> ------------------------------------------------------------ >>> -------------- >>> It looks like orte_init failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during orte_init; some of which are due to configuration or >>> environment problems. This failure appears to be an internal failure; >>> here's some additional information (which may only be relevant to an >>> Open MPI developer): >>> >>> orte_session_dir failed >>> --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS >>> ------------------------------------------------------------ >>> -------------- >>> >>> Thanks, >>> Max >>> >>> >>> On Mon, May 14, 2018 at 11:41 AM, Gus Correa <g...@ldeo.columbia.edu >>> <mailto:g...@ldeo.columbia.edu>> wrote: >>> >>> Hi Max >>> >>> Just in case, as environment mix often happens. >>> Could it be that you are inadvertently picking another >>> installation of OpenMPI, perhaps installed from packages >>> in /usr , or /usr/local? >>> That's easy to check with 'which mpiexec' or >>> 'which mpicc', for instance. >>> >>> Have you tried to prepend (as opposed to append) OpenMPI >>> to your PATH? Say: >>> >>> export >>> PATH='/home/user/openmpi_install/bin:/usr/local/sbin:/usr/ >>> local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/ >>> local/games:/snap/bin' >>> >>> I hope this helps, >>> Gus Correa >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> https://lists.open-mpi.org/mailman/listinfo/users >>> >> >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users >> > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users