Re: [OMPI users] Can't start jobs with srun.

2020-05-07 Thread Patrick Bégou via users
Le 07/05/2020 à 11:42, John Hearns via users a écrit : > Patrick, I am sure that you have asked Dell for support on this issue? No I didn't :-(. I was just accessing these server for a short time to run a bench and the workaround was enough. I'm not using slurm but a local scheduler (OAR) so the p

Re: [OMPI users] Can't start jobs with srun.

2020-05-07 Thread John Hearns via users
Patrick, I am sure that you have asked Dell for support on this issue? On Sun, 26 Apr 2020 at 18:09, Patrick Bégou via users < users@lists.open-mpi.org> wrote: > I have also this problem on servers I'm benching at DELL's lab with > OpenMPI-4.0.3. I've tried a new build of OpenMPI with "--with-pm

Re: [OMPI users] Can't start jobs with srun.

2020-04-27 Thread Daniel Letai via users
I know it's not supposed to matter, but have you tried building both ompi and slurm against same pmix? That is - first build pmix, than build slurm with-pmix, and than ompi with both slurm and pmix=external ? On 23/04/2020 17:00, Prentice Bi

Re: [OMPI users] Can't start jobs with srun.

2020-04-27 Thread Riebs, Andy via users
Lost a line… Also helpful to check $ srun -N3 which ompi_info From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Riebs, Andy via users Sent: Monday, April 27, 2020 10:59 AM To: Open MPI Users Cc: Riebs, Andy Subject: Re: [OMPI users] Can't start jobs with srun. Y’kn

Re: [OMPI users] Can't start jobs with srun.

2020-04-27 Thread Riebs, Andy via users
Y’know, a quick check on versions and PATHs might be a good idea here. I suggest something like $ srun -N3 ompi_info |& grep "MPI repo" to confirm that all nodes are running the same version of OMPI. From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Prentice Bisbal via us

Re: [OMPI users] Can't start jobs with srun.

2020-04-26 Thread Ralph Castain via users
It is entirely possible that the PMI2 support in OMPI v4 is broken - I doubt it is used or tested very much as pretty much everyone has moved to PMIx. In fact, we completely dropped PMI-1 and PMI-2 from OMPI v5 for that reason. I would suggest building Slurm with PMIx v3.1.5 (https://github.com

Re: [OMPI users] Can't start jobs with srun.

2020-04-26 Thread Patrick Bégou via users
I have also this problem on servers I'm benching at DELL's lab with OpenMPI-4.0.3. I've tried  a new build of OpenMPI with "--with-pmi2". No change. Finally my work around in the slurm script was to launch my code with mpirun. As mpirun was only finding one slot per nodes I have used "--oversubscri

Re: [OMPI users] Can't start jobs with srun.

2020-04-24 Thread Riebs, Andy via users
Prentice, have you tried something trivial, like "srun -N3 hostname", to rule out non-OMPI problems? Andy -Original Message- From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Prentice Bisbal via users Sent: Friday, April 24, 2020 2:19 PM To: Ralph Castain ; Open MPI Use

Re: [OMPI users] Can't start jobs with srun.

2020-04-23 Thread Ralph Castain via users
Is Slurm built with PMIx support? Did you tell srun to use it? > On Apr 23, 2020, at 7:00 AM, Prentice Bisbal via users > wrote: > > I'm using OpenMPI 4.0.3 with Slurm 19.05.5 I'm testing the software with a > very simple hello, world MPI program that I've used reliably for years. When > I

[OMPI users] Can't start jobs with srun.

2020-04-23 Thread Prentice Bisbal via users
I'm using OpenMPI 4.0.3 with Slurm 19.05.5  I'm testing the software with a very simple hello, world MPI program that I've used reliably for years. When I submit the job through slurm and use srun to launch the job, I get these errors: *** An error occurred in MPI_Init *** on a NULL communicat