I think the files suggested by Gilles are more about the underlying call to get 
the hostname; those won't be problematic.

The regex Open MPI modules are where Open MPI is running into a problem with 
your hostnames (i.e., your hostnames don't fit into Open MPI's expectations of 
the format of the hostname).  I'm surprised that using the naive module 
(instead of the fwd module) doesn't solve your problem.  ...oh shoot, I see 
why.  It's because I had a typo in what I suggested to you.

Please try:  mpirun --mca regx naive ...

(i.e., "regx", not "regex")

--
Jeff Squyres
jsquy...@cisco.com

________________________________________
From: Patrick Begou <patrick.be...@univ-grenoble-alpes.fr>
Sent: Tuesday, June 21, 2022 12:10 PM
To: Jeff Squyres (jsquyres); Open MPI Users
Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster

Hi Jeff,

Unfortunately the workaround with "--mca regex naive" does not change the 
behaviour. I'm going to investigate OpenMPI sources files as suggested by 
Gilles.

Patrick

Le 16/06/2022 à 17:43, Jeff Squyres (jsquyres) a écrit :

Ah; this is a slightly different error than what Gilles was guessing from your 
prior description.  This is what you're running in to: 
https://github.com/open-mpi/ompi/blob/v4.0.x/orte/mca/regx/fwd/regx_fwd.c#L130-L134

Try running with:

mpirun --mca regex naive ...

Specifically: the "fwd" regex component is selected by default, but it has 
certain expectations about the format of hostnames.  Try using the "naive" 
regex component, instead.

--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com>

________________________________________
From: Patrick Begou 
<patrick.be...@univ-grenoble-alpes.fr><mailto:patrick.be...@univ-grenoble-alpes.fr>
Sent: Thursday, June 16, 2022 9:48 AM
To: Jeff Squyres (jsquyres); Open MPI Users
Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster

Hi  Gilles and Jeff,

@Gilles I will have a look at these files, thanks.

@Jeff this is the error message (screen dump attached) and of course the nodes 
names do not agree with the standard.

Patrick

[cid:part1.KfzAgK4Q.PG6VadQJ@univ-grenoble-alpes.fr]

Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit :

What exactly is the error that is occurring?

--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com><mailto:jsquy...@cisco.com><mailto:jsquy...@cisco.com>

________________________________________
From: users 
<users-boun...@lists.open-mpi.org><mailto:users-boun...@lists.open-mpi.org><mailto:users-boun...@lists.open-mpi.org><mailto:users-boun...@lists.open-mpi.org>
 on behalf of Patrick Begou via users 
<users@lists.open-mpi.org><mailto:users@lists.open-mpi.org><mailto:users@lists.open-mpi.org><mailto:users@lists.open-mpi.org>
Sent: Thursday, June 16, 2022 3:21 AM
To: Open MPI Users
Cc: Patrick Begou
Subject: [OMPI users] OpenMPI and names of the nodes in a cluster

Hi all,

we are facing a serious problem with OpenMPI (4.0.2) that we have
deployed on a cluster. We do not manage this large cluster and the names
of the nodes do not agree with Internet standards for protocols: they
contain a "_" (underscore) character.

So OpenMPI complains about this and do not run.

I've tried to use IP instead of host names in the host file without any
success.

Is there a known workaround for this as requesting the administrators to
change the nodes names on this large cluster may be difficult.

Thanks

Patrick






Reply via email to