Re: [OMPI devel] Problem with bind-to

2017-04-14 Thread Cyril Bordage
Tested with success. Thanks guys. Cyril. Le 14/04/2017 à 11:39, r...@open-mpi.org a écrit : > PR https://github.com/open-mpi/ompi/pull/3356 > >> On Apr 14, 2017, at 2:22 AM, r...@open-mpi.org >> wrote: >> >> Ah, wait - I had missed your bind-to core directive. With th

Re: [OMPI devel] Problem with bind-to

2017-04-14 Thread r...@open-mpi.org
Ah, wait - I had missed your bind-to core directive. With that, it does indeed behave poorly, so I can now replicate. > On Apr 14, 2017, at 2:21 AM, r...@open-mpi.org wrote: > > Sorry, but both of your non-working examples work fine for me: > > $ mpirun -n 16 -host rhc002:16 --report-bindings /

Re: [OMPI devel] Problem with bind-to

2017-04-14 Thread r...@open-mpi.org
Sorry, but both of your non-working examples work fine for me: $ mpirun -n 16 -host rhc002:16 --report-bindings /bin/true [rhc002.cluster:63444] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1

Re: [OMPI devel] Problem with bind-to

2017-04-13 Thread Gilles Gouaillardet
Ralph, i can simply reproduce the issue with two nodes and the latest master all commands are ran on n1, which has the same topology (2 sockets * 8 cores each) than n2 1) everything works $ mpirun -np 16 -bind-to core --report-bindings true [n1:29794] MCW rank 0 bound to socket 0[core 0[hw

Re: [OMPI devel] Problem with bind-to

2017-04-13 Thread r...@open-mpi.org
All right, let’s replace rmaps_base_verbose with odls_base_verbose and see what that saids > On Apr 13, 2017, at 8:34 AM, Cyril Bordage wrote: > > '-report-bindings' does that. > I used this option because the ranks did not seem to be binded (if I use > a rank file the performace is far better)

Re: [OMPI devel] Problem with bind-to

2017-04-13 Thread Cyril Bordage
'-report-bindings' does that. I used this option because the ranks did not seem to be binded (if I use a rank file the performace is far better). Le 13/04/2017 à 17:24, r...@open-mpi.org a écrit : > Okay, so as far as OMPI is concerned, it correctly bound everyone! So how are > you generating thi

Re: [OMPI devel] Problem with bind-to

2017-04-13 Thread r...@open-mpi.org
Okay, so as far as OMPI is concerned, it correctly bound everyone! So how are you generating this output claiming it isn’t bound? > On Apr 13, 2017, at 7:57 AM, Cyril Bordage wrote: > > devel11:17550] [[29888,0],0] rmaps:base set policy with NULL device NONNULL > [devel11:17550] mca:rmaps:selec

Re: [OMPI devel] Problem with bind-to

2017-04-13 Thread Cyril Bordage
devel11:17550] [[29888,0],0] rmaps:base set policy with NULL device NONNULL [devel11:17550] mca:rmaps:select: checking available component mindist [devel11:17550] mca:rmaps:select: Querying component [mindist] [devel11:17550] mca:rmaps:select: checking available component ppr [devel11:17550] mca:rm

Re: [OMPI devel] Problem with bind-to

2017-04-13 Thread r...@open-mpi.org
Okay, so you login node was able to figure out all the bindings. I don’t see any debug output from your compute nodes, which is suspicious. Try adding --leave-session-attached to the cmd line and let’s see if we can capture the compute node daemon’s output > On Apr 13, 2017, at 7:48 AM, Cyril B

Re: [OMPI devel] Problem with bind-to

2017-04-13 Thread Cyril Bordage
My machine file is: miriel025*24 miriel026*24 Le 13/04/2017 à 16:46, Cyril Bordage a écrit : > There is the output: > ## > [devel11:80858] [[2965,0],0] rmaps:base set policy with NULL device NONNULL > [devel11:80858] mca:r

Re: [OMPI devel] Problem with bind-to

2017-04-13 Thread Cyril Bordage
There is the output: ## [devel11:80858] [[2965,0],0] rmaps:base set policy with NULL device NONNULL [devel11:80858] mca:rmaps:select: checking available component mindist [devel11:80858] mca:rmaps:select: Querying component

Re: [OMPI devel] Problem with bind-to

2017-04-13 Thread Gilles Gouaillardet
Is your compute node included in your machine file ? If yes, what if you invoke mpirun from a compute node not listed in your machine file ? It can also be helpful to post your machinefile Cheers, Gilles On Thursday, April 13, 2017, Cyril Bordage wrote: > When I run this command from the compu

Re: [OMPI devel] Problem with bind-to

2017-04-13 Thread r...@open-mpi.org
Try adding "-mca rmaps_base_verbose 5” and see what that output tells us - I assume you have a debug build configured, yes (i.e., added --enable-debug to configure line)? > On Apr 13, 2017, at 7:28 AM, Cyril Bordage wrote: > > When I run this command from the compute node I have also that. Bu

Re: [OMPI devel] Problem with bind-to

2017-04-13 Thread Cyril Bordage
When I run this command from the compute node I have also that. But not when I run it from a login node (with the same machine file). Cyril. Le 13/04/2017 à 16:22, r...@open-mpi.org a écrit : > We are asking all these questions because we cannot replicate your problem - > so we are trying to he

Re: [OMPI devel] Problem with bind-to

2017-04-13 Thread r...@open-mpi.org
We are asking all these questions because we cannot replicate your problem - so we are trying to help you figure out what is different or missing from your machine. When I run your cmd line on my system, I get: [rhc002.cluster:55965] MCW rank 24 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../

Re: [OMPI devel] Problem with bind-to

2017-04-13 Thread gilles
OK thanks, we've had some issues in the past when Open MPI assumed that the (login) node running mpirun has the same topology than the other (compute) nodes. i just wanted to clear this scenario. Cheers, Gilles - Original Message - > I am using the 6886c12 commit. > I have no particula

Re: [OMPI devel] Problem with bind-to

2017-04-13 Thread Cyril Bordage
I am using the 6886c12 commit. I have no particular option for the configuration. I launch my application in the same way as I presented in my firt email, there is the exact line: mpirun -np 48 -machinefile mf -bind-to core -report-bindings ./a.out lstopo does give the same output on both types on

Re: [OMPI devel] Problem with bind-to

2017-04-13 Thread gilles
Also, can you please run lstopo on both your login and compute nodes ? Cheers, Gilles - Original Message - > Can you be a bit more specific? > > - What version of Open MPI are you using? > - How did you configure Open MPI? > - How are you launching Open MPI applications? > > > > On A

Re: [OMPI devel] Problem with bind-to

2017-04-13 Thread Jeff Squyres (jsquyres)
Can you be a bit more specific? - What version of Open MPI are you using? - How did you configure Open MPI? - How are you launching Open MPI applications? > On Apr 13, 2017, at 9:08 AM, Cyril Bordage wrote: > > Hi, > > now this bug happens also when I launch my mpirun command from the > compu

Re: [OMPI devel] Problem with bind-to

2017-04-13 Thread Cyril Bordage
Hi, now this bug happens also when I launch my mpirun command from the compute node. Cyril. Le 06/04/2017 à 05:38, r...@open-mpi.org a écrit : > I believe this has been fixed now - please let me know > >> On Mar 30, 2017, at 1:57 AM, Cyril Bordage wrote: >> >> Hello, >> >> I am using the git

Re: [OMPI devel] Problem with bind-to

2017-04-05 Thread r...@open-mpi.org
I believe this has been fixed now - please let me know > On Mar 30, 2017, at 1:57 AM, Cyril Bordage wrote: > > Hello, > > I am using the git version of MPI with "-bind-to core -report-bindings" > and I get that for all processes: > [miriel010:160662] MCW rank 0 not bound > > > When I use an o

[OMPI devel] Problem with bind-to

2017-03-30 Thread Cyril Bordage
Hello, I am using the git version of MPI with "-bind-to core -report-bindings" and I get that for all processes: [miriel010:160662] MCW rank 0 not bound When I use an old version I get: [miriel010:44921] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././.][././././././././././.