Sorry if I did not make my intent clear. I was basically suggesting to hack the Open MPI and PMIx wrappers to hostname() and remove the problematic underscores to make the regx components a happy panda again.
Cheers, Gilles ----- Original Message ----- > I think the files suggested by Gilles are more about the underlying call to get the hostname; those won't be problematic. > > The regex Open MPI modules are where Open MPI is running into a problem with your hostnames (i.e., your hostnames don't fit into Open MPI's expectations of the format of the hostname). I'm surprised that using the naive module (instead of the fwd module) doesn't solve your problem. ...oh shoot, I see why. It's because I had a typo in what I suggested to you. > > Please try: mpirun --mca regx naive ... > > (i.e., "regx", not "regex") > > -- > Jeff Squyres > jsquy...@cisco.com > > ________________________________________ > From: Patrick Begou <patrick.be...@univ-grenoble-alpes.fr> > Sent: Tuesday, June 21, 2022 12:10 PM > To: Jeff Squyres (jsquyres); Open MPI Users > Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster > > Hi Jeff, > > Unfortunately the workaround with "--mca regex naive" does not change the behaviour. I'm going to investigate OpenMPI sources files as suggested by Gilles. > > Patrick > > Le 16/06/2022 à 17:43, Jeff Squyres (jsquyres) a écrit : > > Ah; this is a slightly different error than what Gilles was guessing from your prior description. This is what you're running in to: https://github.com/open-mpi/ompi/blob/v4.0.x/orte/mca/regx/fwd/regx_fwd.c#L130-L134 > > Try running with: > > mpirun --mca regex naive ... > > Specifically: the "fwd" regex component is selected by default, but it has certain expectations about the format of hostnames. Try using the " naive" regex component, instead. > > -- > Jeff Squyres > jsquy...@cisco.com<mailto:jsquy...@cisco.com> > > ________________________________________ > From: Patrick Begou <patrick.be...@univ-grenoble-alpes.fr><mailto: patrick.be...@univ-grenoble-alpes.fr> > Sent: Thursday, June 16, 2022 9:48 AM > To: Jeff Squyres (jsquyres); Open MPI Users > Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster > > Hi Gilles and Jeff, > > @Gilles I will have a look at these files, thanks. > > @Jeff this is the error message (screen dump attached) and of course the nodes names do not agree with the standard. > > Patrick > > [cid:part1.KfzAgK4Q.PG6VadQJ@univ-grenoble-alpes.fr] > > Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit : > > What exactly is the error that is occurring? > > -- > Jeff Squyres > jsquy...@cisco.com<mailto:jsquy...@cisco.com><mailto:jsquyres@cisco. com><mailto:jsquy...@cisco.com> > > ________________________________________ > From: users <users-boun...@lists.open-mpi.org><mailto:users-bounces@ lists.open-mpi.org><mailto:users-boun...@lists.open-mpi.org><mailto: users-boun...@lists.open-mpi.org> on behalf of Patrick Begou via users < users@lists.open-mpi.org><mailto:users@lists.open-mpi.org><mailto:users@ lists.open-mpi.org><mailto:users@lists.open-mpi.org> > Sent: Thursday, June 16, 2022 3:21 AM > To: Open MPI Users > Cc: Patrick Begou > Subject: [OMPI users] OpenMPI and names of the nodes in a cluster > > Hi all, > > we are facing a serious problem with OpenMPI (4.0.2) that we have > deployed on a cluster. We do not manage this large cluster and the names > of the nodes do not agree with Internet standards for protocols: they > contain a "_" (underscore) character. > > So OpenMPI complains about this and do not run. > > I've tried to use IP instead of host names in the host file without any > success. > > Is there a known workaround for this as requesting the administrators to > change the nodes names on this large cluster may be difficult. > > Thanks > > Patrick > > > > > > >