Hi William, 

Couple other questions: 
 - Please share how you ompi configure line looks like. 
-  Please clarify which is/are the compat libraries you refer to. There are 
some that are actually for the opposite case: Making TS app/libs run on 
Omnipath. 
-  As Gilles mentions, moving to a newer major OMPI version is advisable. If 
this is not possible, move to 1.10.7 that has many updates against 1.10.1. 

Thanks, 

_MAC


-----Original Message-----
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles 
Gouaillardet
Sent: Monday, January 22, 2018 3:31 AM
To: Open MPI Users <users@lists.open-mpi.org>
Subject: Re: [OMPI users] OpenMPI with PSM on True Scale with OmniPath drivers

William,

In order to force PSM (aka Infinipath) you can

mpirun --mca pml cm --mca mtl psm ...

(Replace with psm2 for PSM2 (aka Omnipath)

You can also

mpirun --mca pml_base_verbose 10 --mca mtl_base_verbose 10 ...

in order to collect some logs.

Bottom line, pml/cm should be selected (instead of pml/ob1) and the appropriate 
mtl should be selected.


On top of that, you might need to rebuild Open MPI if some user level library 
has been changed.

Note Open MPI 1.10 is now legacy, and I strongly encourage you to upgrade to 
2.1.x or 3.0.x


Cheers,

Gilles


William Hay <w....@ucl.ac.uk> wrote:
>We have a couple of clusters with Qlogic Infinipath/Intel TrueScale 
>networking.  While testing a kernel upgrade we find that the Truescale 
>drivers will no longer build against recent RHEL kernels.  Intel tells 
>us that the Omnipath drivers will work for True Scale adapters so we 
>install those.  Basic functionality appears fine however we are having 
>trouble getting OpenMPI to work.
>
>Using our existing builds of OpenMPI 1.10 jobs receive lots of signal
>11 and crash(output attached)
>
>If we modify LD_LIBRARY_PATH to point to the directory containing the 
>compatibility library provides as part of the OmniPath drivers it 
>instead produces complainst about not finding /dev/hfi1_0 which exists 
>on our cluster with actual OmniPath but not on the clusters with 
>TrueScale (output also attached).
>
>We had a similar issue with Intel MPI but there it was possible to get 
>it to work by passing a -psm option to mpirun.  That combined with the 
>mention of PSM2 in the output when complaining about /dev/hfi1_0 makes 
>us think OpenMPI is trying to run with PSM2 rather than the original 
>PSM and failing because that isn't supported by TrueScale.
>
>We hoped that there would be an mca parameter or combination of 
>parameters that would resolve this issue but while Googling has turned 
>up a few things that look like they would force the use of PSM over 
>PSM2 none of them seem to make a difference.
>
>Any suggestions?
>
>William
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to