Ha! I finally tracked it down - a new code path that bypassed the prior error output. I have a fix going into master shortly, and will then port it to 1.10.1.
Thanks for your patience! Ralph > On Sep 24, 2015, at 1:12 AM, Patrick Begou > <patrick.be...@legi.grenoble-inp.fr> wrote: > > Sorry for the delay. Runing mpirun whith wrong OMPI_MCA_plm_rsh_agent doesn't > give any explicit message in OpenMPI-1.10.0. > > How I can show the problem: > > I request 2 nodes, 1cpu on each node, 4 cores on each cpu (so 8 cores > availables with cpusets). Node file is: > > [begou@frog7 MPI_TESTS]$ cat $OAR_NODEFILE > frog7 > frog7 > frog7 > frog7 > frog8 > frog8 > frog8 > frog8 > > I launch the application (I've added a grep here to limit the output on > stdout and juste check processes location): > > [begou@frog7 MPI_TESTS]$ mpirun -np 8 --hostfile $OAR_NODEFILE --bind-to core > ./location.exe |grep 'thread is now running on PU' > (process 2) thread is now running on PU logical index 2 (OS/physical index 6) > on system frog7 > (process 3) thread is now running on PU logical index 3 (OS/physical index 7) > on system frog7 > (process 0) thread is now running on PU logical index 0 (OS/physical index 0) > on system frog7 > (process 1) thread is now running on PU logical index 1 (OS/physical index 5) > on system frog7 > (process 6) thread is now running on PU logical index 2 (OS/physical index 2) > on system frog8 > (process 7) thread is now running on PU logical index 3 (OS/physical index 3) > on system frog8 > (process 4) thread is now running on PU logical index 0 (OS/physical index 0) > on system frog8 > (process 5) thread is now running on PU logical index 1 (OS/physical index 1) > on system frog8 > > So one process on each core, no oversubscribing allowed with the patch > applied in OpenMPI. > > Now I set OMPI_MCA_plm_rsh_agent so something wrong and launch agin the job > (without the final grep to have all informations): > > [begou@frog7 MPI_TESTS]$ export OMPI_MCA_plm_rsh_agent=do-not-exist > [begou@frog7 MPI_TESTS]$ mpirun -np 8 --hostfile $OAR_NODEFILE --bind-to core > ./location.exe > -------------------------------------------------------------------------- > A request was made to bind to that would result in binding more > processes than cpus on a resource: > > Bind to: CORE > Node: frog7 > #processes: 2 > #cpus: 1 > > You can override this protection by adding the "overload-allowed" > option to your binding directive. > -------------------------------------------------------------------------- > > The message only show only that OpenMPI try to allocate all processes on the > local node. > > Of course: > [begou@frog7 MPI_TESTS]$ which do-not-exist > /usr/bin/which: no do-not-exist in (/home/PROJECTS/............... > > > Patrick > > -- > =================================================================== > | Equipe M.O.S.T. | | > | Patrick BEGOU | mailto:patrick.be...@grenoble-inp.fr | > | LEGI | | > | BP 53 X | Tel 04 76 82 51 35 | > | 38041 GRENOBLE CEDEX | Fax 04 76 82 52 71 | > =================================================================== > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27659.php