John,

Thanks for the suggestions. In this case there is no cluster manager / job
scheduler; these are just a couple of individual hosts in a rack. The
reason for the generic names is that I anonymized the full network address
in the previous posts, truncating to just the host name.

My home directory is network-mounted to both hosts. In fact, I uninstalled
OpenMPI 3.0.1 from /usr/local on both hosts, and installed OpenMPI 3.1.0
into my home directory at `/home/user/openmpi_install`, also updating
.bashrc appropriately:

user@b09-30:~$ cat .bashrc
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/
sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/
user/openmpi_install/bin
export LD_LIBRARY_PATH=/home/user/openmpi_install/lib

So the environment should be the same on both hosts.

Thanks,
Max

On Mon, May 14, 2018 at 12:29 AM, John Hearns via users <
users@lists.open-mpi.org> wrote:

> One very, very stupid question here. This arose over on the Slurm list
> actually.
> Those hostnames look like quite generic names, ie they are part of an HPC
> cluster?
> Do they happen to have independednt home directories for your userid?
> Could that possibly make a difference to the MPI launcher?
>
> On 14 May 2018 at 06:44, Max Mellette <wmell...@ucsd.edu> wrote:
>
>> Hi Gilles,
>>
>> Thanks for the suggestions; the results are below. Any ideas where to go
>> from here?
>>
>> ----- Seems that selinux is not installed:
>>
>> user@b09-30:~$ sestatus
>> The program 'sestatus' is currently not installed. You can install it by
>> typing:
>> sudo apt install policycoreutils
>>
>> ----- Output from orted:
>>
>> user@b09-30:~$ /usr/bin/ssh -x b09-32 orted
>> [b09-32:197698] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
>> ess_env_module.c at line 147
>> [b09-32:197698] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file
>> util/session_dir.c at line 106
>> [b09-32:197698] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file
>> util/session_dir.c at line 345
>> [b09-32:197698] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file
>> base/ess_base_std_orted.c at line 270
>> ------------------------------------------------------------
>> --------------
>> It looks like orte_init failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during orte_init; some of which are due to configuration or
>> environment problems.  This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>>
>>   orte_session_dir failed
>>   --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
>> ------------------------------------------------------------
>> --------------
>>
>> ----- iptables rules:
>>
>> user@b09-30:~$ sudo iptables -L
>> Chain INPUT (policy ACCEPT)
>> target     prot opt source               destination
>> ufw-before-logging-input  all  --  anywhere             anywhere
>> ufw-before-input  all  --  anywhere             anywhere
>> ufw-after-input  all  --  anywhere             anywhere
>> ufw-after-logging-input  all  --  anywhere             anywhere
>> ufw-reject-input  all  --  anywhere             anywhere
>> ufw-track-input  all  --  anywhere             anywhere
>>
>> Chain FORWARD (policy ACCEPT)
>> target     prot opt source               destination
>> ufw-before-logging-forward  all  --  anywhere             anywhere
>> ufw-before-forward  all  --  anywhere             anywhere
>> ufw-after-forward  all  --  anywhere             anywhere
>> ufw-after-logging-forward  all  --  anywhere             anywhere
>> ufw-reject-forward  all  --  anywhere             anywhere
>> ufw-track-forward  all  --  anywhere             anywhere
>>
>> Chain OUTPUT (policy ACCEPT)
>> target     prot opt source               destination
>> ufw-before-logging-output  all  --  anywhere             anywhere
>> ufw-before-output  all  --  anywhere             anywhere
>> ufw-after-output  all  --  anywhere             anywhere
>> ufw-after-logging-output  all  --  anywhere             anywhere
>> ufw-reject-output  all  --  anywhere             anywhere
>> ufw-track-output  all  --  anywhere             anywhere
>>
>> Chain ufw-after-forward (1 references)
>> target     prot opt source               destination
>>
>> Chain ufw-after-input (1 references)
>> target     prot opt source               destination
>>
>> Chain ufw-after-logging-forward (1 references)
>> target     prot opt source               destination
>>
>> Chain ufw-after-logging-input (1 references)
>> target     prot opt source               destination
>>
>> Chain ufw-after-logging-output (1 references)
>> target     prot opt source               destination
>>
>> Chain ufw-after-output (1 references)
>> target     prot opt source               destination
>>
>> Chain ufw-before-forward (1 references)
>> target     prot opt source               destination
>>
>> Chain ufw-before-input (1 references)
>> target     prot opt source               destination
>>
>> Chain ufw-before-logging-forward (1 references)
>> target     prot opt source               destination
>>
>> Chain ufw-before-logging-input (1 references)
>> target     prot opt source               destination
>>
>> Chain ufw-before-logging-output (1 references)
>> target     prot opt source               destination
>>
>> Chain ufw-before-output (1 references)
>> target     prot opt source               destination
>>
>> Chain ufw-reject-forward (1 references)
>> target     prot opt source               destination
>>
>> Chain ufw-reject-input (1 references)
>> target     prot opt source               destination
>>
>> Chain ufw-reject-output (1 references)
>> target     prot opt source               destination
>>
>> Chain ufw-track-forward (1 references)
>> target     prot opt source               destination
>>
>> Chain ufw-track-input (1 references)
>> target     prot opt source               destination
>>
>> Chain ufw-track-output (1 references)
>> target     prot opt source               destination
>>
>>
>> Thanks,
>> Max
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to