Ha! I finally tracked it down - a new code path that bypassed the prior error 
output. I have a fix going into master shortly, and will then port it to 1.10.1.

Thanks for your patience!
Ralph


> On Sep 24, 2015, at 1:12 AM, Patrick Begou 
> <patrick.be...@legi.grenoble-inp.fr> wrote:
> 
> Sorry for the delay. Runing mpirun whith wrong OMPI_MCA_plm_rsh_agent doesn't 
> give any explicit message in OpenMPI-1.10.0.
> 
> How I can show the problem:
> 
> I request 2 nodes, 1cpu on each node, 4 cores on each cpu (so 8 cores 
> availables with cpusets). Node file is:
> 
> [begou@frog7 MPI_TESTS]$ cat $OAR_NODEFILE
> frog7
> frog7
> frog7
> frog7
> frog8
> frog8
> frog8
> frog8
> 
> I launch the application (I've added a grep here to limit the output on 
> stdout and juste check processes location):
> 
> [begou@frog7 MPI_TESTS]$ mpirun -np 8 --hostfile $OAR_NODEFILE --bind-to core 
> ./location.exe |grep 'thread is now running on PU'
> (process 2) thread is now running on PU logical index 2 (OS/physical index 6) 
> on system frog7
> (process 3) thread is now running on PU logical index 3 (OS/physical index 7) 
> on system frog7
> (process 0) thread is now running on PU logical index 0 (OS/physical index 0) 
> on system frog7
> (process 1) thread is now running on PU logical index 1 (OS/physical index 5) 
> on system frog7
> (process 6) thread is now running on PU logical index 2 (OS/physical index 2) 
> on system frog8
> (process 7) thread is now running on PU logical index 3 (OS/physical index 3) 
> on system frog8
> (process 4) thread is now running on PU logical index 0 (OS/physical index 0) 
> on system frog8
> (process 5) thread is now running on PU logical index 1 (OS/physical index 1) 
> on system frog8
> 
> So one process on each core, no oversubscribing allowed with the patch 
> applied in OpenMPI.
> 
> Now I set OMPI_MCA_plm_rsh_agent so something wrong and launch agin the job 
> (without the final grep to have all informations):
> 
> [begou@frog7 MPI_TESTS]$ export OMPI_MCA_plm_rsh_agent=do-not-exist
> [begou@frog7 MPI_TESTS]$ mpirun -np 8 --hostfile $OAR_NODEFILE --bind-to core 
> ./location.exe
> --------------------------------------------------------------------------
> A request was made to bind to that would result in binding more
> processes than cpus on a resource:
> 
>   Bind to:     CORE
>   Node:        frog7
>   #processes:  2
>   #cpus:       1
> 
> You can override this protection by adding the "overload-allowed"
> option to your binding directive.
> --------------------------------------------------------------------------
> 
> The message only show only that OpenMPI try to allocate all processes on the 
> local node.
> 
> Of course:
> [begou@frog7 MPI_TESTS]$ which do-not-exist
> /usr/bin/which: no do-not-exist in (/home/PROJECTS/...............
> 
> 
> Patrick
> 
> -- 
> ===================================================================
> |  Equipe M.O.S.T.         |                                      |
> |  Patrick BEGOU           | mailto:patrick.be...@grenoble-inp.fr |
> |  LEGI                    |                                      |
> |  BP 53 X                 | Tel 04 76 82 51 35                   |
> |  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71                   |
> ===================================================================
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/09/27659.php

Reply via email to