Stupid answer from me. If latency/bandwidth numbers are bad then check that
you are really running over the interface that you think you should be. You
could be falling back to running over Ethernet.

On Mon, 28 Feb 2022 at 20:10, Angel de Vicente via users <
users@lists.open-mpi.org> wrote:

> Hello,
>
> "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> writes:
>
> > I'd recommend against using Open MPI v3.1.0 -- it's quite old.  If you
> > have to use Open MPI v3.1.x, I'd at least suggest using v3.1.6, which
> > has all the rolled-up bug fixes on the v3.1.x series.
> >
> > That being said, Open MPI v4.1.2 is the most current.  Open MPI v4.1.2
> does
> > restrict which versions of UCX it uses because there are bugs in the
> older
> > versions of UCX.  I am not intimately familiar with UCX -- you'll need
> to ask
> > Nvidia for support there -- but I was under the impression that it's
> just a
> > user-level library, and you could certainly install your own copy of UCX
> to use
> > with your compilation of Open MPI.  I.e., you're not restricted to
> whatever UCX
> > is installed in the cluster system-default locations.
>
> I did follow your advice, so I compiled my own version of UCX (1.11.2)
> and OpenMPI v4.1.1, but for some reason the latency / bandwidth numbers
> are really bad compared to the previous ones, so something is wrong, but
> not sure how to debug it.
>
> > I don't know why you're getting MXM-specific error messages; those don't
> appear
> > to be coming from Open MPI (especially since you configured Open MPI with
> > --without-mxm).  If you can upgrade to Open MPI v4.1.2 and the latest
> UCX, see
> > if you are still getting those MXM error messages.
>
> In this latest attempt, yes, the MXM error messages are still there.
>
> Cheers,
> --
> Ángel de Vicente
>
> Tel.: +34 922 605 747
> Web.: http://research.iac.es/proyecto/polmag/
>
> ---------------------------------------------------------------------------------------------
> AVISO LEGAL: Este mensaje puede contener información confidencial y/o
> privilegiada. Si usted no es el destinatario final del mismo o lo ha
> recibido por error, por favor notifíquelo al remitente inmediatamente.
> Cualquier uso no autorizadas del contenido de este mensaje está
> estrictamente prohibida. Más información en:
> https://www.iac.es/es/responsabilidad-legal
> DISCLAIMER: This message may contain confidential and / or privileged
> information. If you are not the final recipient or have received it in
> error, please notify the sender immediately. Any unauthorized use of the
> content of this message is strictly prohibited. More information:
> https://www.iac.es/en/disclaimer
>

Reply via email to