Hi Ralph, I misunderstood the point of the problem.
The problem is that BIND_TO_OBJ is re-tried and done in orte_ess_base_proc_binding @ ess_base_fns.c, although you try to BIND_TO_NONE in rmaps_rr_mapper.c when it's oversubscribed. Furthermore, binding in orte_ess_base_proc_binding does not support cpus_per_rank. So when BIND_TO_CORE is specified and it's oversubscribed with pe=N, the final binding we get is broken. If you really want to BIND TO NONE, you should delete binding part of orte_ess_base_proc_binding. Or, if it's used for other purpose and impossible to delete, it's better that you instead delete "OPAL_SET_BINDING_ POLICY(OPAL_BIND_TO_NONE) in the rr_mappers and just leave warning message. Tetsuya > Hi Ralph, I have tested your fix - 30895. I'm afraid to say > I found a mistake. > > You should include "SETTING BIND_TO_NONE" in the above if-clause > at the line 74, 256, 511, 656. Othrewise, just warning message > disappears but binding to core is still overwritten by binding > to none. Pleaes see attached patch. > > (See attached file: patch_from_30895) > > Tetsuya > > > > Hi Ralph, I understood what you meant. > > > > I often use float for our applicatoin. > > float c = (float)(unsinged int a - unsinged int b) could > > be very huge number, if a < b. So I always carefully cast to > > int from unsigned int when I subtract them. I didn't know/mind > > inc d = (unsinged int a - unsinged int b) has no problem. > > I noticed it by your suggestion, thanks. > > > > Therefore, I think my fix is not necesarry. > > > > Tetsuya > > > > > > > Yes, indeed. In future, when we will have many many cores > > > in the machine, we will have to take care of overrun of > > > num_procs. > > > > > > Tetsuya > > > > > > > Cool - easily modified. Thanks! > > > > > > > > Of course, you understand (I'm sure) that the cast does nothing to > > > protect the code from blowing up if we overrun the var. In other words, > > if > > > the unsigned var has wrapped, then casting it to int > > > > won't help - you'll still get a negative integer, and the code will > > > trash. > > > > > > > > > > > > On Feb 28, 2014, at 3:43 PM, tmish...@jcity.maeda.co.jp wrote: > > > > > > > > > > > > > > > > > > > Hi Ralph, I'm a litte bit late to your release. > > > > > > > > > > I found a minor mistake in byobj_span -integer casting problem. > > > > > > > > > > --- rmaps_rr_mappers.30892.c 2014-03-01 08:31:50 +0900 > > > > > +++ rmaps_rr_mappers.c 2014-03-01 08:33:22 +0900 > > > > > @@ -689,7 +689,7 @@ > > > > > } > > > > > > > > > > /* compute how many objs need an extra proc */ > > > > > - if (0 > (nxtra_objs = app->num_procs - (navg * nobjs))) { > > > > > + if (0 > (nxtra_objs = (int)app->num_procs - (navg * > > (int)nobjs))) > > > { > > > > > nxtra_objs = 0; > > > > > } > > > > > > > > > > Tetsuya > > > > > > > > > >> Please take a look at > > https://svn.open-mpi.org/trac/ompi/ticket/4317 > > > > >> > > > > >> > > > > >> On Feb 27, 2014, at 8:13 PM, tmish...@jcity.maeda.co.jp wrote: > > > > >> > > > > >>> > > > > >>> > > > > >>> Hi Ralph, I can't operate our cluster for a few days, sorry. > > > > >>> > > > > >>> But now, I'm narrowing down the cause by browsing the source > code. > > > > >>> > > > > >>> My best guess is the line 529. The > opal_hwloc_base_get_obj_by_type > > > will > > > > >>> reset the object pointer to the first one when you move on to the > > > next > > > > >>> node. > > > > >>> > > > > >>> 529 if (NULL == (obj = > > > > > opal_hwloc_base_get_obj_by_type > > > > >>> (node->topology, target, cache_level, i, OPAL_HWLOC_AVAILABLE))) > { > > > > >>> 530 ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND); > > > > >>> 531 return ORTE_ERR_NOT_FOUND; > > > > >>> 532 } > > > > >>> > > > > >>> if node->slots=1, then nprocs is set as nprocs=1 in the second > > pass: > > > > >>> > > > > >>> 495 nprocs = (node->slots - node->slots_inuse) / > > > > >>> orte_rmaps_base.cpus_per_rank; > > > > >>> 496 if (nprocs < 1) { > > > > >>> 497 if (second_pass) { > > > > >>> 498 /* already checked for oversubscription > > > > > permission, > > > > >>> so at least put > > > > >>> 499 * one proc on it > > > > >>> 500 */ > > > > >>> 501 nprocs = 1; > > > > >>> > > > > >>> Therefore, opal_hwloc_base_get_obj_by_type is called one by one > at > > > each > > > > >>> node, which means > > > > >>> the object we get is always first one. > > > > >>> > > > > >>> It's not elegant but I guess you need dummy calls of > > > > >>> opal_hwloc_base_get_obj_by_type to > > > > >>> move the object pointer to the right place or modify > > > > >>> opal_hwloc_base_get_obj_by_type itself. > > > > >>> > > > > >>> Tetsuya > > > > >>> > > > > >>>> I'm having trouble seeing why it is failing, so I added some > more > > > > > debug > > > > >>> output. Could you run the failure case again with -mca > > > > > rmaps_base_verbose > > > > >>> 10? > > > > >>>> > > > > >>>> Thanks > > > > >>>> Ralph > > > > >>>> > > > > >>>> On Feb 27, 2014, at 6:11 PM, tmish...@jcity.maeda.co.jp wrote: > > > > >>>> > > > > >>>>> > > > > >>>>> > > > > >>>>> Just checking the difference, not so significant meaning... > > > > >>>>> > > > > >>>>> Anyway, I guess it's due to the behavior when slot counts is > > > missing > > > > >>>>> (regarded as slots=1) and it's oversubscribed unintentionally. > > > > >>>>> > > > > >>>>> I'm going out now, so I can't verify it quickly. If I provide > the > > > > >>>>> correct slot counts, it wll work, I guess. How do you think? > > > > >>>>> > > > > >>>>> Tetsuya > > > > >>>>> > > > > >>>>>> "restore" in what sense? > > > > >>>>>> > > > > >>>>>> On Feb 27, 2014, at 4:10 PM, tmish...@jcity.maeda.co.jp wrote: > > > > >>>>>> > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> Hi Ralph, this is just for your information. > > > > >>>>>>> > > > > >>>>>>> I tried to restore previous orte_rmaps_rr_byobj. Then I gets > > the > > > > >>> result > > > > >>>>>>> below with this command line: > > > > >>>>>>> > > > > >>>>>>> mpirun -np 8 -host node05,node06 -report-bindings -map-by > > > > > socket:pe=2 > > > > >>>>>>> -display-map -bind-to core:overload-allowed > > > > >>> ~/mis/openmpi/demos/myprog > > > > >>>>>>> Data for JOB [31184,1] offset 0 > > > > >>>>>>> > > > > >>>>>>> ======================== JOB MAP ======================== > > > > >>>>>>> > > > > >>>>>>> Data for node: node05 Num slots: 1 Max slots: 0 Num > > procs: > > > 7 > > > > >>>>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 0 > > > > >>>>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 2 > > > > >>>>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 4 > > > > >>>>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 6 > > > > >>>>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 1 > > > > >>>>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 3 > > > > >>>>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 5 > > > > >>>>>>> > > > > >>>>>>> Data for node: node06 Num slots: 1 Max slots: 0 Num > > procs: > > > 1 > > > > >>>>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 7 > > > > >>>>>>> > > > > >>>>>>> ============================================================= > > > > >>>>>>> [node06.cluster:18857] MCW rank 7 bound to socket 0[core 0 > [hwt > > > 0]], > > > > >>>>> socket > > > > >>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > > > > >>>>>>> [node05.cluster:21399] MCW rank 3 bound to socket 1[core 6 > [hwt > > > 0]], > > > > >>>>> socket > > > > >>>>>>> 1[core 7[hwt 0]]: [./././.][././B/B] > > > > >>>>>>> [node05.cluster:21399] MCW rank 4 bound to socket 0[core 0 > [hwt > > > 0]], > > > > >>>>> socket > > > > >>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > > > > >>>>>>> [node05.cluster:21399] MCW rank 5 bound to socket 1[core 4 > [hwt > > > 0]], > > > > >>>>> socket > > > > >>>>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.] > > > > >>>>>>> [node05.cluster:21399] MCW rank 6 bound to socket 0[core 2 > [hwt > > > 0]], > > > > >>>>> socket > > > > >>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] > > > > >>>>>>> [node05.cluster:21399] MCW rank 0 bound to socket 0[core 0 > [hwt > > > 0]], > > > > >>>>> socket > > > > >>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > > > > >>>>>>> [node05.cluster:21399] MCW rank 1 bound to socket 1[core 4 > [hwt > > > 0]], > > > > >>>>> socket > > > > >>>>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.] > > > > >>>>>>> [node05.cluster:21399] MCW rank 2 bound to socket 0[core 2 > [hwt > > > 0]], > > > > >>>>> socket > > > > >>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] > > > > >>>>>>> .... > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> Then I add "-hostfile pbs_hosts" and the result is: > > > > >>>>>>> > > > > >>>>>>> [mishima@manage work]$cat pbs_hosts > > > > >>>>>>> node05 slots=8 > > > > >>>>>>> node06 slots=8 > > > > >>>>>>> [mishima@manage work]$ mpirun -np 8 -hostfile > ~/work/pbs_hosts > > > > >>>>>>> -report-bindings -map-by socket:pe=2 -display-map > > > > >>>>>>> ~/mis/openmpi/demos/myprog > > > > >>>>>>> Data for JOB [30254,1] offset 0 > > > > >>>>>>> > > > > >>>>>>> ======================== JOB MAP ======================== > > > > >>>>>>> > > > > >>>>>>> Data for node: node05 Num slots: 8 Max slots: 0 Num > > procs: > > > 4 > > > > >>>>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 0 > > > > >>>>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 2 > > > > >>>>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 1 > > > > >>>>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 3 > > > > >>>>>>> > > > > >>>>>>> Data for node: node06 Num slots: 8 Max slots: 0 Num > > procs: > > > 4 > > > > >>>>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 4 > > > > >>>>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 6 > > > > >>>>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 5 > > > > >>>>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 7 > > > > >>>>>>> > > > > >>>>>>> ============================================================= > > > > >>>>>>> [node05.cluster:21501] MCW rank 2 bound to socket 0[core 2 > [hwt > > > 0]], > > > > >>>>> socket > > > > >>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] > > > > >>>>>>> [node05.cluster:21501] MCW rank 3 bound to socket 1[core 6 > [hwt > > > 0]], > > > > >>>>> socket > > > > >>>>>>> 1[core 7[hwt 0]]: [./././.][././B/B] > > > > >>>>>>> [node05.cluster:21501] MCW rank 0 bound to socket 0[core 0 > [hwt > > > 0]], > > > > >>>>> socket > > > > >>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > > > > >>>>>>> [node05.cluster:21501] MCW rank 1 bound to socket 1[core 4 > [hwt > > > 0]], > > > > >>>>> socket > > > > >>>>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.] > > > > >>>>>>> [node06.cluster:18935] MCW rank 6 bound to socket 0[core 2 > [hwt > > > 0]], > > > > >>>>> socket > > > > >>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] > > > > >>>>>>> [node06.cluster:18935] MCW rank 7 bound to socket 1[core 6 > [hwt > > > 0]], > > > > >>>>> socket > > > > >>>>>>> 1[core 7[hwt 0]]: [./././.][././B/B] > > > > >>>>>>> [node06.cluster:18935] MCW rank 4 bound to socket 0[core 0 > [hwt > > > 0]], > > > > >>>>> socket > > > > >>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > > > > >>>>>>> [node06.cluster:18935] MCW rank 5 bound to socket 1[core 4 > [hwt > > > 0]], > > > > >>>>> socket > > > > >>>>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.] > > > > >>>>>>> .... > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> I think previous version's behavior would be close to what I > > > > > expect. > > > > >>>>>>> > > > > >>>>>>> Tetusya > > > > >>>>>>> > > > > >>>>>>>> They have 4 cores/socket and 2 sockets, totally 4 X 2 = 8 > > cores, > > > > >>> each. > > > > >>>>>>>> > > > > >>>>>>>> Here is the output of lstopo. > > > > >>>>>>>> > > > > >>>>>>>> mishima@manage round_robin]$ rsh node05 > > > > >>>>>>>> Last login: Tue Feb 18 15:10:15 from manage > > > > >>>>>>>> [mishima@node05 ~]$ lstopo > > > > >>>>>>>> Machine (32GB) > > > > >>>>>>>> NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (6144KB) > > > > >>>>>>>> L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 > + > > PU > > > > > L#0 > > > > >>>>>>>> (P#0) > > > > >>>>>>>> L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 > + > > PU > > > > > L#1 > > > > >>>>>>>> (P#1) > > > > >>>>>>>> L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 > + > > PU > > > > > L#2 > > > > >>>>>>>> (P#2) > > > > >>>>>>>> L2 L#3 (512KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 > + > > PU > > > > > L#3 > > > > >>>>>>>> (P#3) > > > > >>>>>>>> NUMANode L#1 (P#1 16GB) + Socket L#1 + L3 L#1 (6144KB) > > > > >>>>>>>> L2 L#4 (512KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 > + > > PU > > > > > L#4 > > > > >>>>>>>> (P#4) > > > > >>>>>>>> L2 L#5 (512KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 > + > > PU > > > > > L#5 > > > > >>>>>>>> (P#5) > > > > >>>>>>>> L2 L#6 (512KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6 > + > > PU > > > > > L#6 > > > > >>>>>>>> (P#6) > > > > >>>>>>>> L2 L#7 (512KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7 > + > > PU > > > > > L#7 > > > > >>>>>>>> (P#7) > > > > >>>>>>>> .... > > > > >>>>>>>> > > > > >>>>>>>> I foucused on byobj_span and bynode. I didn't notice byobj > was > > > > >>>>> modified, > > > > >>>>>>>> sorry. > > > > >>>>>>>> > > > > >>>>>>>> Tetsuya > > > > >>>>>>>> > > > > >>>>>>>>> Hmmm..what does your node look like again (sockets and > > cores)? > > > > >>>>>>>>> > > > > >>>>>>>>> On Feb 27, 2014, at 3:19 PM, tmish...@jcity.maeda.co.jp > > wrote: > > > > >>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>>> Hi Ralph, I'm afraid to say your new "map-by obj" causes > > > another > > > > >>>>>>>> problem. > > > > >>>>>>>>>> > > > > >>>>>>>>>> I have overload message with this command line as shown > > below: > > > > >>>>>>>>>> > > > > >>>>>>>>>> mpirun -np 8 -host node05,node06 -report-bindings -map-by > > > > >>>>> socket:pe=2 > > > > >>>>>>>>>> -display-map ~/mis/openmpi/d > > > > >>>>>>>>>> emos/myprog > > > > >>>>>>>>>> > > > > >>>>>>>> > > > > >>>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > > -------------------------------------------------------------------------- > > > > >>>>>>>>>> A request was made to bind to that would result in binding > > > more > > > > >>>>>>>>>> processes than cpus on a resource: > > > > >>>>>>>>>> > > > > >>>>>>>>>> Bind to: CORE > > > > >>>>>>>>>> Node: node05 > > > > >>>>>>>>>> #processes: 2 > > > > >>>>>>>>>> #cpus: 1 > > > > >>>>>>>>>> > > > > >>>>>>>>>> You can override this protection by adding the > > > > > "overload-allowed" > > > > >>>>>>>>>> option to your binding directive. > > > > >>>>>>>>>> > > > > >>>>>>>> > > > > >>>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > > -------------------------------------------------------------------------- > > > > >>>>>>>>>> > > > > >>>>>>>>>> Then, I add "-bind-to core:overload-allowed" to see what > > > > > happenes. > > > > >>>>>>>>>> > > > > >>>>>>>>>> mpirun -np 8 -host node05,node06 -report-bindings -map-by > > > > >>>>> socket:pe=2 > > > > >>>>>>>>>> -display-map -bind-to core:o > > > > >>>>>>>>>> verload-allowed ~/mis/openmpi/demos/myprog > > > > >>>>>>>>>> Data for JOB [14398,1] offset 0 > > > > >>>>>>>>>> > > > > >>>>>>>>>> ======================== JOB MAP > > ======================== > > > > >>>>>>>>>> > > > > >>>>>>>>>> Data for node: node05 Num slots: 1 Max slots: 0 Num > > > > > procs: > > > > >>> 4 > > > > >>>>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 0 > > > > >>>>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 1 > > > > >>>>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 2 > > > > >>>>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 3 > > > > >>>>>>>>>> > > > > >>>>>>>>>> Data for node: node06 Num slots: 1 Max slots: 0 Num > > > > > procs: > > > > >>> 4 > > > > >>>>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 4 > > > > >>>>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 5 > > > > >>>>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 6 > > > > >>>>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 7 > > > > >>>>>>>>>> > > > > >>>>>>>>>> > > ============================================================= > > > > >>>>>>>>>> [node06.cluster:18443] MCW rank 6 bound to socket 0[core 0> [hwt > > > > >>> 0]], > > > > >>>>>>>> socket > > > > >>>>>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > > > > >>>>>>>>>> [node05.cluster:20901] MCW rank 2 bound to socket 0[core 0 > > [hwt > > > > >>> 0]], > > > > >>>>>>>> socket > > > > >>>>>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > > > > >>>>>>>>>> [node06.cluster:18443] MCW rank 7 bound to socket 0[core 2 > > [hwt > > > > >>> 0]], > > > > >>>>>>>> socket > > > > >>>>>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] > > > > >>>>>>>>>> [node05.cluster:20901] MCW rank 3 bound to socket 0[core 2 > > [hwt > > > > >>> 0]], > > > > >>>>>>>> socket > > > > >>>>>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] > > > > >>>>>>>>>> [node06.cluster:18443] MCW rank 4 bound to socket 0[core 0 > > [hwt > > > > >>> 0]], > > > > >>>>>>>> socket > > > > >>>>>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > > > > >>>>>>>>>> [node05.cluster:20901] MCW rank 0 bound to socket 0[core 0 > > [hwt > > > > >>> 0]], > > > > >>>>>>>> socket > > > > >>>>>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > > > > >>>>>>>>>> [node06.cluster:18443] MCW rank 5 bound to socket 0[core 2 > > [hwt > > > > >>> 0]], > > > > >>>>>>>> socket > > > > >>>>>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] > > > > >>>>>>>>>> [node05.cluster:20901] MCW rank 1 bound to socket 0[core 2 > > [hwt > > > > >>> 0]],> > >>>>>>>> socket > > > > >>>>>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] > > > > >>>>>>>>>> Hello world from process 4 of 8 > > > > >>>>>>>>>> Hello world from process 2 of 8 > > > > >>>>>>>>>> Hello world from process 6 of 8 > > > > >>>>>>>>>> Hello world from process 0 of 8 > > > > >>>>>>>>>> Hello world from process 5 of 8 > > > > >>>>>>>>>> Hello world from process 1 of 8 > > > > >>>>>>>>>> Hello world from process 7 of 8 > > > > >>>>>>>>>> Hello world from process 3 of 8 > > > > >>>>>>>>>> > > > > >>>>>>>>>> When I add "map-by obj:span", it works fine: > > > > >>>>>>>>>> > > > > >>>>>>>>>> mpirun -np 8 -host node05,node06 -report-bindings -map-by > > > > >>>>>>>> socket:pe=2,span > > > > >>>>>>>>>> -display-map ~/mis/ope > > > > >>>>>>>>>> nmpi/demos/myprog > > > > >>>>>>>>>> Data for JOB [14703,1] offset 0 > > > > >>>>>>>>>> > > > > >>>>>>>>>> ======================== JOB MAP > > ======================== > > > > >>>>>>>>>> > > > > >>>>>>>>>> Data for node: node05 Num slots: 1 Max slots: 0 Num > > > > > procs: > > > > >>> 4 > > > > >>>>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 0 > > > > >>>>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 2 > > > > >>>>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 1 > > > > >>>>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 3 > > > > >>>>>>>>>>> >>>>>>>>>> Data for node: node06 Num slots: 1 Max > > slots: 0 Num > > > > > procs: > > > > >>> 4 > > > > >>>>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 4 > > > > >>>>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 6 > > > > >>>>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 5 > > > > >>>>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 7 > > > > >>>>>>>>>> > > > > >>>>>>>>>> > > ============================================================= > > > > >>>>>>>>>> [node06.cluster:18491] MCW rank 6 bound to socket 0[core 2 > > [hwt > > > > >>> 0]], > > > > >>>>>>>> socket > > > > >>>>>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] > > > > >>>>>>>>>> [node05.cluster:20949] MCW rank 2 bound to socket 0[core 2 > > [hwt > > > > >>> 0]], > > > > >>>>>>>> socket > > > > >>>>>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] > > > > >>>>>>>>>> [node06.cluster:18491] MCW rank 7 bound to socket 1[core 6 > > [hwt > > > > >>> 0]], > > > > >>>>>>>> socket>>>>>>>>>> 1[core 7[hwt 0]]: [./././.][././B/B] > > > > >>>>>>>>>> [node05.cluster:20949] MCW rank 3 bound to socket 1[core 6 > > [hwt > > > > >>> 0]], > > > > >>>>>>>> socket > > > > >>>>>>>>>> 1[core 7[hwt 0]]: [./././.][././B/B] > > > > >>>>>>>>>> [node06.cluster:18491] MCW rank 4 bound to socket 0[core 0 > > [hwt > > > > >>> 0]], > > > > >>>>>>>> socket > > > > >>>>>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > > > > >>>>>>>>>> [node05.cluster:20949] MCW rank 0 bound to socket 0[core 0 > > [hwt > > > > >>> 0]], > > > > >>>>>>>> socket > > > > >>>>>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > > > > >>>>>>>>>> [node06.cluster:18491] MCW rank 5 bound to socket 1[core 4 > > [hwt > > > > >>> 0]], > > > > >>>>>>>> socket > > > > >>>>>>>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.] > > > > >>>>>>>>>> [node05.cluster:20949] MCW rank 1 bound to socket 1[core 4 > > [hwt > > > > >>> 0]], > > > > >>>>>>>> socket > > > > >>>>>>>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.] > > > > >>>>>>>>>> .... > > > > >>>>>>>>>> > > > > >>>>>>>>>> So, byobj_span would be okay. Of course, bynode and byslot > > > > > should > > > > >>> be > > > > >>>>>>>> okay. > > > > >>>>>>>>>> Could you take a look at orte_rmaps_rr_byobj again? > > > > >>>>>>>>>> > > > > >>>>>>>>>> Regards, > > > > >>>>>>>>>> Tetsuya Mishima > > > > >>>>>>>>>> > > > > >>>>>>>>>> _______________________________________________ > > > > >>>>>>>>>> users mailing list > > > > >>>>>>>>>> us...@open-mpi.org > > > > >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > >>>>>>>>> > > > > >>>>>>>>> _______________________________________________ > > > > >>>>>>>>> users mailing list > > > > >>>>>>>>> us...@open-mpi.org > > > > >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > >>>>>>>> > > > > >>>>>>>> _______________________________________________ > > > > >>>>>>>> users mailing list > > > > >>>>>>>> us...@open-mpi.org > > > > >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > >>>>>>> > > > > >>>>>>> _______________________________________________ > > > > >>>>>>> users mailing list > > > > >>>>>>> us...@open-mpi.org>>>>> > > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > >>>>>> > > > > >>>>>> _______________________________________________ > > > > >>>>>> users mailing list > > > > >>>>>> us...@open-mpi.org > > > > >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > >>>>> > > > > >>>>> _______________________________________________ > > > > >>>>> users mailing list > > > > >>>>> us...@open-mpi.org > > > > >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > >>>> > > > > >>>> _______________________________________________ > > > > >>>> users mailing list > > > > >>>> us...@open-mpi.org > > > > >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > >>> > > > > >>> _______________________________________________ > > > > >>> users mailing list > > > > >>> us...@open-mpi.org > > > > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > >> > > > > >> _______________________________________________ > > > > >> users mailing list > > > > >> us...@open-mpi.org > > > > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > > > _______________________________________________ > > > > > users mailing list > > > > > us...@open-mpi.org > > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > _______________________________________________ > > > > users mailing list > > > > us...@open-mpi.org > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org> http://www.open-mpi.org/mailman/listinfo.cgi/users - patch_from_30895_______________________________________________ > users mailing list > users@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/users