Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread Allan Overstreet
Below are the results from the ibnetdiscover command This command was run from node smd. # # Topology file: generated on Fri May 19 15:59:47 2017 # # Initiated from node 0002c903000a0a32 port 0002c903000a0a34 vendid=0x8f1 devid=0x5a5a sysimgguid=0x8f105001094d3

Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread Elken, Tom
users-boun...@lists.open-mpi.org] On Behalf Of Gilles Gouaillardet Sent: Friday, May 19, 2017 12:16 AM To: Open MPI Users <users@lists.open-mpi.org> Subject: Re: [OMPI users] Many different errors with ompi version 2.1.1 Allan, i just noted smd has a Mellanox card, while other nodes have QLogic

Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread John Hearns via users
Allan, remember that Infiniband is not Ethernet. You dont NEED to set up IPOIB interfaces. Two diagnostics please for you to run: ibnetdiscover ibdiagnet Let us please have the reuslts ofibnetdiscover On 19 May 2017 at 09:25, John Hearns wrote: > Giles,

Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread John Hearns via users
Giles, Allan, if the host 'smd' is acting as a cluster head node it is not a must for it to have an Infiniband card. So you should be able to run jobs across the other nodes, which have Qlogic cards. I may have something mixed up here, if so I am sorry. If you want also to run jobs on the smd

Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread Gilles Gouaillardet
Allan, i just noted smd has a Mellanox card, while other nodes have QLogic cards. mtl/psm works best for QLogic while btl/openib (or mtl/mxm) work best for Mellanox, but these are not interoperable. also, i do not think btl/openib can be used with QLogic cards (please someone correct me

Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread Gilles Gouaillardet
Allan, - on which node is mpirun invoked ? - are you running from a batch manager ? - is there any firewall running on your nodes ? - how many interfaces are part of bond0 ? the error is likely occuring when wiring-up mpirun/orted what if you mpirun -np 2 --hostfile nodes --mca

[OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread Allan Overstreet
I experiencing many different errors with openmpi version 2.1.1. I have had a suspicion that this might be related to the way the servers were connected and configured. Regardless below is a diagram of how the server are configured. __ _