Hi,I’m trying to debug a user’s program that uses dynamic process management through Rmpi + doMPI. We’re seeing a hang in MPI_Comm_disconnect. Each of the processes is in#0 0x7ff72513168c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0#1 0x7ff7130760d3 in PMIx_Disconnect
The problem here is that you have made an incorrect assumption. In the older
OMPI versions, the -H option simply indicated that the specified hosts were
available for use - it did not imply the number of slots on that host. Since
you have specified 2 slots on each host, and you told mpirun to
You can try to disable SLURM :
mpirun --mca ras ^slurm --mca plm ^slurm --mca ess ^slurm,slurmd ...
That will require you are able to SSH between compute nodes.
Keep in mind this is far form ideal since it might leave some MPI
processes on nodes if you cancel a job, and mess SLURM accounting
Hi all,
I am trying to run mpi application through SLURM job scheduler. Here is my
running sequence
sbatch --> my_env_script.sh --> my_run_script.sh --> mpirun
In order to minimize modification of my production environment, I had to
setup following hostlist management in different scripts: