Also, you probably want to add "vader" into your BTL specification.  Although 
the name is counter-intuitive, "vader" in Open MPI v3.x and v4.x is the shared 
memory transport.  Hence, if you run with "btl=tcp,self", you are only allowing 
MPI processes to talk via the TCP stack or process loopback (which, by 
definition, is only for a process to talk to itself) -- even if they are on the 
same node.

Instead, if you run with "btl=tcp,vader,self", then MPI processes can talk via 
TCP, process loopback, or shared memory.  Hence, if two MPI processes are on 
the same node, they can use shared memory to communicate, which is 
significantly​ faster than TCP.

NOTE:​ In the upcoming Open MPI v5.0.x, the name "vader" has (finally) been 
deprecated and replaced with the more intuitive name "sm".  While 
"btl=tcp,vader,self" will work fine in v5.0.x for backwards compatibility with 
v4.x and v3.x, "btl=tcp,sm,self" is preferred for v5.0.x and forward (and "sm" 
is just a more intuitive name than "vader").

The problem you were seeing was because the openib BTL component was 
complaining that, as the help message described, the environment was not set 
correctly to allow using the qib0 device correctly.  Hence, it seems like you 
have a secondary / HPC-quality network available (which could be faster / more 
efficient than TCP), but it isn't configured properly in your environment.  You 
might want to investigate the suggestion from the help message to set the 
memlock limits correctly, and see if using the qib0 interfaces would yield 
better performance.

--
Jeff Squyres
jsquy...@cisco.com
________________________________
From: users <users-boun...@lists.open-mpi.org> on behalf of Gilles Gouaillardet 
via users <users@lists.open-mpi.org>
Sent: Tuesday, November 29, 2022 3:36 AM
To: Gestió Servidors via users <users@lists.open-mpi.org>
Cc: Gilles Gouaillardet <gil...@rist.or.jp>
Subject: Re: [OMPI users] Question about "mca" parameters

Hi,


Simply add


btl = tcp,self


If the openib error message persists, try also adding

osc_rdma_btls = ugni,uct,ucp

or simply

osc = ^rdma



Cheers,


Gilles

On 11/29/2022 5:16 PM, Gestió Servidors via users wrote:
>
> Hi,
>
> If I run “mpirun --mca btl tcp,self --mca allow_ib 0 -n 12
> ./my_program”, I get to disable some “extra” info in the output file like:
>
> The OpenFabrics (openib) BTL failed to initialize while trying to
>
> allocate some locked memory.  This typically can indicate that the
>
> memlock limits are set too low.  For most HPC installations, the
>
> memlock limits should be set to "unlimited".  The failure occured
>
> here:
>
> Local host:    clus11
>
> OMPI source:   btl_openib.c:757
>
> Function:      opal_free_list_init()
>
> Device:        qib0
>
> Memlock limit: 65536
>
> You may need to consult with your system administrator to get this
>
> problem fixed.  This FAQ entry on the Open MPI web site may also be
>
> helpful:
>
> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
>
> --------------------------------------------------------------------------
>
> [clus11][[33029,1],0][btl_openib.c:1062:mca_btl_openib_add_procs]
> could not prepare openib device for use
>
> [clus11][[33029,1],1][btl_openib.c:1062:mca_btl_openib_add_procs]
> could not prepare openib device for use
>
> [clus11][[33029,1],9][btl_openib.c:1062:mca_btl_openib_add_procs]
> could not prepare openib device for use
>
> [clus11][[33029,1],8][btl_openib.c:1062:mca_btl_openib_add_procs]
> could not prepare openib device for use
>
> [clus11][[33029,1],2][btl_openib.c:1062:mca_btl_openib_add_procs]
> could not prepare openib device for use
>
> [clus11][[33029,1],6][btl_openib.c:1062:mca_btl_openib_add_procs]
> could not prepare openib device for use
>
> [clus11][[33029,1],10][btl_openib.c:1062:mca_btl_openib_add_procs]
> could not prepare openib device for use
>
> [clus11][[33029,1],11][btl_openib.c:1062:mca_btl_openib_add_procs]
> could not prepare openib device for use
>
> [clus11][[33029,1],5][btl_openib.c:1062:mca_btl_openib_add_procs]
> could not prepare openib device for use
>
> [clus11][[33029,1],3][btl_openib.c:1062:mca_btl_openib_add_procs]
> could not prepare openib device for use
>
> [clus11][[33029,1],4][btl_openib.c:1062:mca_btl_openib_add_procs]
> could not prepare openib device for use
>
> [clus11][[33029,1],7][btl_openib.c:1062:mca_btl_openib_add_procs]
> could not prepare openib device for use
>
> or like
>
> By default, for Open MPI 4.0 and later, infiniband ports on a device
>
> are not used by default.  The intent is to use UCX for these devices.
>
> You can override this policy by setting the btl_openib_allow_ib MCA
> parameter
>
> to true.
>
> Local host:              clus11
>
> Local adapter:           qib0
>
> Local port:              1
>
> --------------------------------------------------------------------------
>
> --------------------------------------------------------------------------
>
> WARNING: There was an error initializing an OpenFabrics device.
>
> Local host:   clus11
>
> Local device: qib0
>
> --------------------------------------------------------------------------
>
> so, now, I would like to force that parameters in file
> $OMPI/etc/openmpi-mca-params.conf. I have run “ompi_info --param all
> all --level 9” to get all parameters, but I don’t know exactly what
> parameters I need to add to $OMPI/etc/openmpi-mca-params.conf and what
> is the correcty syntax of them to force always “--mca btl tcp,self
> --mca allow_ib 0”. I have already added “btl_openib_allow_ib = “ and
> it works, but for parametes “--mca btl tcp,self”, what would be the
> correct syntax in $OMPI/etc/openmpi-mca-params.conf file?
>
> Thanks!!
>

Reply via email to