Also, you probably want to add "vader" into your BTL specification. Although the name is counter-intuitive, "vader" in Open MPI v3.x and v4.x is the shared memory transport. Hence, if you run with "btl=tcp,self", you are only allowing MPI processes to talk via the TCP stack or process loopback (which, by definition, is only for a process to talk to itself) -- even if they are on the same node.
Instead, if you run with "btl=tcp,vader,self", then MPI processes can talk via TCP, process loopback, or shared memory. Hence, if two MPI processes are on the same node, they can use shared memory to communicate, which is significantly faster than TCP. NOTE: In the upcoming Open MPI v5.0.x, the name "vader" has (finally) been deprecated and replaced with the more intuitive name "sm". While "btl=tcp,vader,self" will work fine in v5.0.x for backwards compatibility with v4.x and v3.x, "btl=tcp,sm,self" is preferred for v5.0.x and forward (and "sm" is just a more intuitive name than "vader"). The problem you were seeing was because the openib BTL component was complaining that, as the help message described, the environment was not set correctly to allow using the qib0 device correctly. Hence, it seems like you have a secondary / HPC-quality network available (which could be faster / more efficient than TCP), but it isn't configured properly in your environment. You might want to investigate the suggestion from the help message to set the memlock limits correctly, and see if using the qib0 interfaces would yield better performance. -- Jeff Squyres jsquy...@cisco.com ________________________________ From: users <users-boun...@lists.open-mpi.org> on behalf of Gilles Gouaillardet via users <users@lists.open-mpi.org> Sent: Tuesday, November 29, 2022 3:36 AM To: Gestió Servidors via users <users@lists.open-mpi.org> Cc: Gilles Gouaillardet <gil...@rist.or.jp> Subject: Re: [OMPI users] Question about "mca" parameters Hi, Simply add btl = tcp,self If the openib error message persists, try also adding osc_rdma_btls = ugni,uct,ucp or simply osc = ^rdma Cheers, Gilles On 11/29/2022 5:16 PM, Gestió Servidors via users wrote: > > Hi, > > If I run “mpirun --mca btl tcp,self --mca allow_ib 0 -n 12 > ./my_program”, I get to disable some “extra” info in the output file like: > > The OpenFabrics (openib) BTL failed to initialize while trying to > > allocate some locked memory. This typically can indicate that the > > memlock limits are set too low. For most HPC installations, the > > memlock limits should be set to "unlimited". The failure occured > > here: > > Local host: clus11 > > OMPI source: btl_openib.c:757 > > Function: opal_free_list_init() > > Device: qib0 > > Memlock limit: 65536 > > You may need to consult with your system administrator to get this > > problem fixed. This FAQ entry on the Open MPI web site may also be > > helpful: > > http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages > > -------------------------------------------------------------------------- > > [clus11][[33029,1],0][btl_openib.c:1062:mca_btl_openib_add_procs] > could not prepare openib device for use > > [clus11][[33029,1],1][btl_openib.c:1062:mca_btl_openib_add_procs] > could not prepare openib device for use > > [clus11][[33029,1],9][btl_openib.c:1062:mca_btl_openib_add_procs] > could not prepare openib device for use > > [clus11][[33029,1],8][btl_openib.c:1062:mca_btl_openib_add_procs] > could not prepare openib device for use > > [clus11][[33029,1],2][btl_openib.c:1062:mca_btl_openib_add_procs] > could not prepare openib device for use > > [clus11][[33029,1],6][btl_openib.c:1062:mca_btl_openib_add_procs] > could not prepare openib device for use > > [clus11][[33029,1],10][btl_openib.c:1062:mca_btl_openib_add_procs] > could not prepare openib device for use > > [clus11][[33029,1],11][btl_openib.c:1062:mca_btl_openib_add_procs] > could not prepare openib device for use > > [clus11][[33029,1],5][btl_openib.c:1062:mca_btl_openib_add_procs] > could not prepare openib device for use > > [clus11][[33029,1],3][btl_openib.c:1062:mca_btl_openib_add_procs] > could not prepare openib device for use > > [clus11][[33029,1],4][btl_openib.c:1062:mca_btl_openib_add_procs] > could not prepare openib device for use > > [clus11][[33029,1],7][btl_openib.c:1062:mca_btl_openib_add_procs] > could not prepare openib device for use > > or like > > By default, for Open MPI 4.0 and later, infiniband ports on a device > > are not used by default. The intent is to use UCX for these devices. > > You can override this policy by setting the btl_openib_allow_ib MCA > parameter > > to true. > > Local host: clus11 > > Local adapter: qib0 > > Local port: 1 > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > WARNING: There was an error initializing an OpenFabrics device. > > Local host: clus11 > > Local device: qib0 > > -------------------------------------------------------------------------- > > so, now, I would like to force that parameters in file > $OMPI/etc/openmpi-mca-params.conf. I have run “ompi_info --param all > all --level 9” to get all parameters, but I don’t know exactly what > parameters I need to add to $OMPI/etc/openmpi-mca-params.conf and what > is the correcty syntax of them to force always “--mca btl tcp,self > --mca allow_ib 0”. I have already added “btl_openib_allow_ib = “ and > it works, but for parametes “--mca btl tcp,self”, what would be the > correct syntax in $OMPI/etc/openmpi-mca-params.conf file? > > Thanks!! >