Re: [OMPI users] Debugging a crash

2021-01-31 Thread Diego Zuccato via users
Il 29/01/21 15:58, Gilles Gouaillardet via users ha scritto: Hi Gilles. Tks for the answer. > the mpirun command line starts 2 MPI task, but the error log mentions > rank 56, so unless there is a copy/paste error, this is highly > suspicious. Uhm... Going to re-check. Most probably it's just my

Re: [OMPI users] Debugging a crash

2021-01-29 Thread Gilles Gouaillardet via users
Diego, the mpirun command line starts 2 MPI task, but the error log mentions rank 56, so unless there is a copy/paste error, this is highly suspicious. I invite you to check the filesystem usage on this node, and make sure there is a similar amount of available space in /tmp and /dev/shm (or othe

[OMPI users] Debugging a crash

2021-01-29 Thread Diego Zuccato via users
Hello all. I'm having a problem with a job: if it gets scheduled on a specific node of our cluster, it fails with: -8<-- -- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the