[OMPI users] 3.x - hang in MPI_Comm_disconnect

2018-05-16 Thread Ben Menadue
Hi,I’m trying to debug a user’s program that uses dynamic process management through Rmpi + doMPI. We’re seeing a hang in MPI_Comm_disconnect. Each of the processes is in#0  0x7ff72513168c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0#1  0x7ff7130760d3 in PMIx_Disconnect

Re: [OMPI users] slurm configuration override mpirun command line process mapping

2018-05-16 Thread r...@open-mpi.org
The problem here is that you have made an incorrect assumption. In the older OMPI versions, the -H option simply indicated that the specified hosts were available for use - it did not imply the number of slots on that host. Since you have specified 2 slots on each host, and you told mpirun to

Re: [OMPI users] slurm configuration override mpirun command line process mapping

2018-05-16 Thread Gilles Gouaillardet
You can try to disable SLURM : mpirun --mca ras ^slurm --mca plm ^slurm --mca ess ^slurm,slurmd ... That will require you are able to SSH between compute nodes. Keep in mind this is far form ideal since it might leave some MPI processes on nodes if you cancel a job, and mess SLURM accounting

[OMPI users] slurm configuration override mpirun command line process mapping

2018-05-16 Thread Nicolas Deladerriere
Hi all, I am trying to run mpi application through SLURM job scheduler. Here is my running sequence sbatch --> my_env_script.sh --> my_run_script.sh --> mpirun In order to minimize modification of my production environment, I had to setup following hostlist management in different scripts: