Sorry if I did not make my intent clear.

I was basically suggesting to hack the Open MPI and PMIx wrappers to 
hostname() and remove the problematic underscores to make the regx 
components a happy panda again.

Cheers,

Gilles

----- Original Message -----
> I think the files suggested by Gilles are more about the underlying 
call to get the hostname; those won't be problematic.
> 
> The regex Open MPI modules are where Open MPI is running into a 
problem with your hostnames (i.e., your hostnames don't fit into Open 
MPI's expectations of the format of the hostname).  I'm surprised that 
using the naive module (instead of the fwd module) doesn't solve your 
problem.  ...oh shoot, I see why.  It's because I had a typo in what I 
suggested to you.
> 
> Please try:  mpirun --mca regx naive ...
> 
> (i.e., "regx", not "regex")
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> 
> ________________________________________
> From: Patrick Begou <patrick.be...@univ-grenoble-alpes.fr>
> Sent: Tuesday, June 21, 2022 12:10 PM
> To: Jeff Squyres (jsquyres); Open MPI Users
> Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster
> 
> Hi Jeff,
> 
> Unfortunately the workaround with "--mca regex naive" does not change 
the behaviour. I'm going to investigate OpenMPI sources files as 
suggested by Gilles.
> 
> Patrick
> 
> Le 16/06/2022 à 17:43, Jeff Squyres (jsquyres) a écrit :
> 
> Ah; this is a slightly different error than what Gilles was guessing 
from your prior description.  This is what you're running in to: 
https://github.com/open-mpi/ompi/blob/v4.0.x/orte/mca/regx/fwd/regx_fwd.c#L130-L134

> 
> Try running with:
> 
> mpirun --mca regex naive ...
> 
> Specifically: the "fwd" regex component is selected by default, but it 
has certain expectations about the format of hostnames.  Try using the "
naive" regex component, instead.
> 
> --
> Jeff Squyres
> jsquy...@cisco.com<mailto:jsquy...@cisco.com>
> 
> ________________________________________
> From: Patrick Begou <patrick.be...@univ-grenoble-alpes.fr><mailto:
patrick.be...@univ-grenoble-alpes.fr>
> Sent: Thursday, June 16, 2022 9:48 AM
> To: Jeff Squyres (jsquyres); Open MPI Users
> Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster
> 
> Hi  Gilles and Jeff,
> 
> @Gilles I will have a look at these files, thanks.
> 
> @Jeff this is the error message (screen dump attached) and of course 
the nodes names do not agree with the standard.
> 
> Patrick
> 
> [cid:part1.KfzAgK4Q.PG6VadQJ@univ-grenoble-alpes.fr]
> 
> Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit :
> 
> What exactly is the error that is occurring?
> 
> --
> Jeff Squyres
> jsquy...@cisco.com<mailto:jsquy...@cisco.com><mailto:jsquyres@cisco.
com><mailto:jsquy...@cisco.com>
> 
> ________________________________________
> From: users <users-boun...@lists.open-mpi.org><mailto:users-bounces@
lists.open-mpi.org><mailto:users-boun...@lists.open-mpi.org><mailto:
users-boun...@lists.open-mpi.org> on behalf of Patrick Begou via users <
users@lists.open-mpi.org><mailto:users@lists.open-mpi.org><mailto:users@
lists.open-mpi.org><mailto:users@lists.open-mpi.org>
> Sent: Thursday, June 16, 2022 3:21 AM
> To: Open MPI Users
> Cc: Patrick Begou
> Subject: [OMPI users] OpenMPI and names of the nodes in a cluster
> 
> Hi all,
> 
> we are facing a serious problem with OpenMPI (4.0.2) that we have
> deployed on a cluster. We do not manage this large cluster and the 
names
> of the nodes do not agree with Internet standards for protocols: they
> contain a "_" (underscore) character.
> 
> So OpenMPI complains about this and do not run.
> 
> I've tried to use IP instead of host names in the host file without 
any
> success.
> 
> Is there a known workaround for this as requesting the administrators 
to
> change the nodes names on this large cluster may be difficult.
> 
> Thanks
> 
> Patrick
> 
> 
> 
> 
> 
> 
> 

Reply via email to