John, Thanks for the suggestions. In this case there is no cluster manager / job scheduler; these are just a couple of individual hosts in a rack. The reason for the generic names is that I anonymized the full network address in the previous posts, truncating to just the host name.
My home directory is network-mounted to both hosts. In fact, I uninstalled OpenMPI 3.0.1 from /usr/local on both hosts, and installed OpenMPI 3.1.0 into my home directory at `/home/user/openmpi_install`, also updating .bashrc appropriately: user@b09-30:~$ cat .bashrc export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/ sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/ user/openmpi_install/bin export LD_LIBRARY_PATH=/home/user/openmpi_install/lib So the environment should be the same on both hosts. Thanks, Max On Mon, May 14, 2018 at 12:29 AM, John Hearns via users < users@lists.open-mpi.org> wrote: > One very, very stupid question here. This arose over on the Slurm list > actually. > Those hostnames look like quite generic names, ie they are part of an HPC > cluster? > Do they happen to have independednt home directories for your userid? > Could that possibly make a difference to the MPI launcher? > > On 14 May 2018 at 06:44, Max Mellette <wmell...@ucsd.edu> wrote: > >> Hi Gilles, >> >> Thanks for the suggestions; the results are below. Any ideas where to go >> from here? >> >> ----- Seems that selinux is not installed: >> >> user@b09-30:~$ sestatus >> The program 'sestatus' is currently not installed. You can install it by >> typing: >> sudo apt install policycoreutils >> >> ----- Output from orted: >> >> user@b09-30:~$ /usr/bin/ssh -x b09-32 orted >> [b09-32:197698] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file >> ess_env_module.c at line 147 >> [b09-32:197698] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file >> util/session_dir.c at line 106 >> [b09-32:197698] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file >> util/session_dir.c at line 345 >> [b09-32:197698] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file >> base/ess_base_std_orted.c at line 270 >> ------------------------------------------------------------ >> -------------- >> It looks like orte_init failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during orte_init; some of which are due to configuration or >> environment problems. This failure appears to be an internal failure; >> here's some additional information (which may only be relevant to an >> Open MPI developer): >> >> orte_session_dir failed >> --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS >> ------------------------------------------------------------ >> -------------- >> >> ----- iptables rules: >> >> user@b09-30:~$ sudo iptables -L >> Chain INPUT (policy ACCEPT) >> target prot opt source destination >> ufw-before-logging-input all -- anywhere anywhere >> ufw-before-input all -- anywhere anywhere >> ufw-after-input all -- anywhere anywhere >> ufw-after-logging-input all -- anywhere anywhere >> ufw-reject-input all -- anywhere anywhere >> ufw-track-input all -- anywhere anywhere >> >> Chain FORWARD (policy ACCEPT) >> target prot opt source destination >> ufw-before-logging-forward all -- anywhere anywhere >> ufw-before-forward all -- anywhere anywhere >> ufw-after-forward all -- anywhere anywhere >> ufw-after-logging-forward all -- anywhere anywhere >> ufw-reject-forward all -- anywhere anywhere >> ufw-track-forward all -- anywhere anywhere >> >> Chain OUTPUT (policy ACCEPT) >> target prot opt source destination >> ufw-before-logging-output all -- anywhere anywhere >> ufw-before-output all -- anywhere anywhere >> ufw-after-output all -- anywhere anywhere >> ufw-after-logging-output all -- anywhere anywhere >> ufw-reject-output all -- anywhere anywhere >> ufw-track-output all -- anywhere anywhere >> >> Chain ufw-after-forward (1 references) >> target prot opt source destination >> >> Chain ufw-after-input (1 references) >> target prot opt source destination >> >> Chain ufw-after-logging-forward (1 references) >> target prot opt source destination >> >> Chain ufw-after-logging-input (1 references) >> target prot opt source destination >> >> Chain ufw-after-logging-output (1 references) >> target prot opt source destination >> >> Chain ufw-after-output (1 references) >> target prot opt source destination >> >> Chain ufw-before-forward (1 references) >> target prot opt source destination >> >> Chain ufw-before-input (1 references) >> target prot opt source destination >> >> Chain ufw-before-logging-forward (1 references) >> target prot opt source destination >> >> Chain ufw-before-logging-input (1 references) >> target prot opt source destination >> >> Chain ufw-before-logging-output (1 references) >> target prot opt source destination >> >> Chain ufw-before-output (1 references) >> target prot opt source destination >> >> Chain ufw-reject-forward (1 references) >> target prot opt source destination >> >> Chain ufw-reject-input (1 references) >> target prot opt source destination >> >> Chain ufw-reject-output (1 references) >> target prot opt source destination >> >> Chain ufw-track-forward (1 references) >> target prot opt source destination >> >> Chain ufw-track-input (1 references) >> target prot opt source destination >> >> Chain ufw-track-output (1 references) >> target prot opt source destination >> >> >> Thanks, >> Max >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users >> > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users