Re: [OMPI users] Open MPI + Slurm + lmod

2022-01-25 Thread Matthias Leopold via users
Hi Gilles, I'm indeed using srun, I didn't have luck using mpirun yet. Are option 2 + 3 of your list really different things? As far as I understood now I need "Open MPI with PMI support", THEN I can use srun with PMIx. Right now using "srun --mpi=pmix(_v3)" gives the error mentioned below.

Re: [OMPI users] Open MPI + Slurm + lmod

2022-01-25 Thread Gilles Gouaillardet via users
Matthias, Thanks for the clarifications. Unfortunately, I cannot connect the dots and I must be missing something. If I recap correctly: - SLURM has builtin PMIx support - Open MPI has builtin PMIx support - srun explicitly requires PMIx (srun --mpi=pmix_v3 ...) - and yet Open MPI issues an

Re: [OMPI users] Open MPI + Slurm + lmod

2022-01-25 Thread Matthias Leopold via users
PMIx library version used by SLURM is 3.2.3 Am 25.01.22 um 11:04 schrieb Gilles Gouaillardet: PMIx library version used by SLURM

[OMPI users] Gadget2 error 818 when using more than 1 process?

2022-01-25 Thread Diego Zuccato via users
Hello all. A user of our cluster is experiencing a weird problem that I can't pinpoint. He does have a job script that worked well on every node. I's based on Gadget2. Lately, *sometimes*, the same executable with the same parameters file works, sometimes it fails. On the same node and submi

Re: [OMPI users] Open MPI + Slurm + lmod

2022-01-25 Thread Matthias Leopold via users
just in case anyone wants to do more debugging: I ran "srun --mpi=pmix" now with "LD_DEBUG=all", the lines preceding the error are 1263345: symbol=write; lookup in file=/lib/x86_64-linux-gnu/libpthread.so.0 [0] 1263345: binding file /msc/sw/hpc-sdk/Linux_x86_64/21.9/comm_libs/mpi/lib/

Re: [OMPI users] Open MPI + Slurm + lmod

2022-01-25 Thread Ralph Castain via users
Never seen anything like that before - am I reading those errors correctly that it cannot find the "write" function symbol in libc?? Frankly, if that's true then it sounds like something is borked in the system. > On Jan 25, 2022, at 8:26 AM, Matthias Leopold via users > wrote: > > just in c

[OMPI users] Creating An MPI Job from Procs Launched by a Different Launcher

2022-01-25 Thread Saliya Ekanayake via users
Hi, I am trying to run an MPI program on a platform that launches the processes using a custom launcher (not mpiexec). This will end up spawning N processes of the program, but I am not sure if MPI_Init() would work or not in this case? Is it possible to have a group of processes launched by some

Re: [OMPI users] Creating An MPI Job from Procs Launched by a Different Launcher

2022-01-25 Thread Ralph Castain via users
Short answer is yes, but it it a bit complicated to do. On Jan 25, 2022, at 12:28 PM, Saliya Ekanayake via users mailto:users@lists.open-mpi.org> > wrote: Hi, I am trying to run an MPI program on a platform that launches the processes using a custom launcher (not mpiexec). This will end up spa

Re: [OMPI users] Open MPI + Slurm + lmod

2022-01-25 Thread Matthias Leopold via users
Thanks a lot for feedback to you and Gilles. I'm completely new to this, at least I know now what _should_ work. I'll look into the lmod part, maybe I screwed something there, I'm a newbie there too... Matthias Am 25.01.22 um 18:17 schrieb Ralph Castain via users: Never seen anything like tha

Re: [OMPI users] Gadget2 error 818 when using more than 1 process?

2022-01-25 Thread Jeff Squyres (jsquyres) via users
I'm afraid I don't know anything about Gadget, so I can't comment there. How exactly does the application fail? Can you try upgrading to Open MPI v4.1.2? What networking are you using? -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Diego

Re: [OMPI users] Creating An MPI Job from Procs Launched by a Different Launcher

2022-01-25 Thread Saliya Ekanayake via users
Any pointers? On Tue, Jan 25, 2022 at 12:55 PM Ralph Castain via users < users@lists.open-mpi.org> wrote: > Short answer is yes, but it it a bit complicated to do. > > On Jan 25, 2022, at 12:28 PM, Saliya Ekanayake via users < > users@lists.open-mpi.org> wrote: > > Hi, > > I am trying to run an M

Re: [OMPI users] Creating An MPI Job from Procs Launched by a Different Launcher

2022-01-25 Thread Gilles Gouaillardet via users
You need a way for your process to exchange information so MPI_Init() can work. One option is to have your custom launcher implement a PMIx server https://pmix.github.io If you choose this path, you will likely want to use the Open PMIx reference implementation https://openpmix.github.io

Re: [OMPI users] Gadget2 error 818 when using more than 1 process?

2022-01-25 Thread Diego Zuccato via users
Il 26/01/2022 02:10, Jeff Squyres (jsquyres) via users ha scritto: I'm afraid I don't know anything about Gadget, so I can't comment there. How exactly does the application fail? Neither did I :( It fails saying a 'timestep' is 0, and that's usually caused by an error in the parameters file.