Le 08/05/2020 à 21:56, Prentice Bisbal via users a écrit :
>
> We often get the following errors when more than one job runs on the
> same compute node. We are using Slurm with OpenMPI. The IB cards are
> QLogic using PSM:
>
> 10698ipath_userinit: assign_context command failed: Network is down
> node01.10698can't open /dev/ipath, network down (err=26)
> node01.10703ipath_userinit: assign_context command failed: Network is down
> node01.10703can't open /dev/ipath, network down (err=26)
> node01.10701ipath_userinit: assign_context command failed: Network is down
> node01.10701can't open /dev/ipath, network down (err=26)
> node01.10700ipath_userinit: assign_context command failed: Network is down
> node01.10700can't open /dev/ipath, network down (err=26)
> node01.10697ipath_userinit: assign_context command failed: Network is down
> node01.10697can't open /dev/ipath, network down (err=26)
> --------------------------------------------------------------------------
> PSM was unable to open an endpoint. Please make sure that the network
> link is
> active on the node and the hardware is functioning.
>
> Error: Could not detect network connectivity
> --------------------------------------------------------------------------
>
> Any Ideas how to fix this?
>
> -- 
> Prentice 


Hi Prentice,

This is not openMPI related but merely due to your hardware. I've not
many details but I think this occurs when several jobs share the same
node and you have a large number of cores on these nodes (> 14). If this
is the case:

On Qlogic (I'm using such a hardware at this time) you have 16 channel
for communication on each HBA and, if I remember what I had read many
years ago, 2 are dedicated to the system. When launching MPI
applications, each process of a job request for it's own dedicated
channel if available, else they share ALL the available channels. So if
a second job starts on the same node it do not remains any available
channel.

To avoid this situation I force sharing the channels (my nodes have 20
codes) by 2 MPI processes. You can set this with a simple environment
variable. On all my cluster nodes I create the file:

*/etc/profile.d/ibsetcontext.sh*

And it contains:

# allow 4 processes to share an hardware MPI context
# in infiniband with PSM
*export PSM_RANKS_PER_CONTEXT=2*

Of course if some people manage to oversubscribe on the cores (more than
one process by core) it could rise again the problem but we do not
oversubscribe.

Hope this can help you.

Patrick

Reply via email to