Hi Ralph, I can't operate our cluster for a few days, sorry.
But now, I'm narrowing down the cause by browsing the source code. My best guess is the line 529. The opal_hwloc_base_get_obj_by_type will reset the object pointer to the first one when you move on to the next node. 529 if (NULL == (obj = opal_hwloc_base_get_obj_by_type (node->topology, target, cache_level, i, OPAL_HWLOC_AVAILABLE))) { 530 ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND); 531 return ORTE_ERR_NOT_FOUND; 532 } if node->slots=1, then nprocs is set as nprocs=1 in the second pass: 495 nprocs = (node->slots - node->slots_inuse) / orte_rmaps_base.cpus_per_rank; 496 if (nprocs < 1) { 497 if (second_pass) { 498 /* already checked for oversubscription permission, so at least put 499 * one proc on it 500 */ 501 nprocs = 1; Therefore, opal_hwloc_base_get_obj_by_type is called one by one at each node, which means the object we get is always first one. It's not elegant but I guess you need dummy calls of opal_hwloc_base_get_obj_by_type to move the object pointer to the right place or modify opal_hwloc_base_get_obj_by_type itself. Tetsuya > I'm having trouble seeing why it is failing, so I added some more debug output. Could you run the failure case again with -mca rmaps_base_verbose 10? > > Thanks > Ralph > > On Feb 27, 2014, at 6:11 PM, tmish...@jcity.maeda.co.jp wrote: > > > > > > > Just checking the difference, not so significant meaning... > > > > Anyway, I guess it's due to the behavior when slot counts is missing > > (regarded as slots=1) and it's oversubscribed unintentionally. > > > > I'm going out now, so I can't verify it quickly. If I provide the > > correct slot counts, it wll work, I guess. How do you think? > > > > Tetsuya > > > >> "restore" in what sense? > >> > >> On Feb 27, 2014, at 4:10 PM, tmish...@jcity.maeda.co.jp wrote: > >> > >>> > >>> > >>> Hi Ralph, this is just for your information. > >>> > >>> I tried to restore previous orte_rmaps_rr_byobj. Then I gets the result > >>> below with this command line: > >>> > >>> mpirun -np 8 -host node05,node06 -report-bindings -map-by socket:pe=2 > >>> -display-map -bind-to core:overload-allowed ~/mis/openmpi/demos/myprog > >>> Data for JOB [31184,1] offset 0 > >>> > >>> ======================== JOB MAP ======================== > >>> > >>> Data for node: node05 Num slots: 1 Max slots: 0 Num procs: 7 > >>> Process OMPI jobid: [31184,1] App: 0 Process rank: 0 > >>> Process OMPI jobid: [31184,1] App: 0 Process rank: 2 > >>> Process OMPI jobid: [31184,1] App: 0 Process rank: 4 > >>> Process OMPI jobid: [31184,1] App: 0 Process rank: 6 > >>> Process OMPI jobid: [31184,1] App: 0 Process rank: 1 > >>> Process OMPI jobid: [31184,1] App: 0 Process rank: 3 > >>> Process OMPI jobid: [31184,1] App: 0 Process rank: 5 > >>> > >>> Data for node: node06 Num slots: 1 Max slots: 0 Num procs: 1 > >>> Process OMPI jobid: [31184,1] App: 0 Process rank: 7 > >>> > >>> ============================================================= > >>> [node06.cluster:18857] MCW rank 7 bound to socket 0[core 0[hwt 0]], > > socket > >>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > >>> [node05.cluster:21399] MCW rank 3 bound to socket 1[core 6[hwt 0]], > > socket > >>> 1[core 7[hwt 0]]: [./././.][././B/B] > >>> [node05.cluster:21399] MCW rank 4 bound to socket 0[core 0[hwt 0]], > > socket > >>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > >>> [node05.cluster:21399] MCW rank 5 bound to socket 1[core 4[hwt 0]], > > socket > >>> 1[core 5[hwt 0]]: [./././.][B/B/./.] > >>> [node05.cluster:21399] MCW rank 6 bound to socket 0[core 2[hwt 0]], > > socket > >>> 0[core 3[hwt 0]]: [././B/B][./././.] > >>> [node05.cluster:21399] MCW rank 0 bound to socket 0[core 0[hwt 0]], > > socket > >>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > >>> [node05.cluster:21399] MCW rank 1 bound to socket 1[core 4[hwt 0]], > > socket > >>> 1[core 5[hwt 0]]: [./././.][B/B/./.] > >>> [node05.cluster:21399] MCW rank 2 bound to socket 0[core 2[hwt 0]], > > socket > >>> 0[core 3[hwt 0]]: [././B/B][./././.] > >>> .... > >>> > >>> > >>> Then I add "-hostfile pbs_hosts" and the result is: > >>> > >>> [mishima@manage work]$cat pbs_hosts > >>> node05 slots=8 > >>> node06 slots=8 > >>> [mishima@manage work]$ mpirun -np 8 -hostfile ~/work/pbs_hosts > >>> -report-bindings -map-by socket:pe=2 -display-map > >>> ~/mis/openmpi/demos/myprog > >>> Data for JOB [30254,1] offset 0 > >>> > >>> ======================== JOB MAP ======================== > >>> > >>> Data for node: node05 Num slots: 8 Max slots: 0 Num procs: 4 > >>> Process OMPI jobid: [30254,1] App: 0 Process rank: 0 > >>> Process OMPI jobid: [30254,1] App: 0 Process rank: 2 > >>> Process OMPI jobid: [30254,1] App: 0 Process rank: 1 > >>> Process OMPI jobid: [30254,1] App: 0 Process rank: 3 > >>> > >>> Data for node: node06 Num slots: 8 Max slots: 0 Num procs: 4 > >>> Process OMPI jobid: [30254,1] App: 0 Process rank: 4 > >>> Process OMPI jobid: [30254,1] App: 0 Process rank: 6 > >>> Process OMPI jobid: [30254,1] App: 0 Process rank: 5 > >>> Process OMPI jobid: [30254,1] App: 0 Process rank: 7 > >>> > >>> ============================================================= > >>> [node05.cluster:21501] MCW rank 2 bound to socket 0[core 2[hwt 0]], > > socket > >>> 0[core 3[hwt 0]]: [././B/B][./././.] > >>> [node05.cluster:21501] MCW rank 3 bound to socket 1[core 6[hwt 0]], > > socket > >>> 1[core 7[hwt 0]]: [./././.][././B/B] > >>> [node05.cluster:21501] MCW rank 0 bound to socket 0[core 0[hwt 0]], > > socket > >>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > >>> [node05.cluster:21501] MCW rank 1 bound to socket 1[core 4[hwt 0]], > > socket > >>> 1[core 5[hwt 0]]: [./././.][B/B/./.] > >>> [node06.cluster:18935] MCW rank 6 bound to socket 0[core 2[hwt 0]], > > socket > >>> 0[core 3[hwt 0]]: [././B/B][./././.] > >>> [node06.cluster:18935] MCW rank 7 bound to socket 1[core 6[hwt 0]], > > socket > >>> 1[core 7[hwt 0]]: [./././.][././B/B] > >>> [node06.cluster:18935] MCW rank 4 bound to socket 0[core 0[hwt 0]], > > socket > >>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > >>> [node06.cluster:18935] MCW rank 5 bound to socket 1[core 4[hwt 0]], > > socket > >>> 1[core 5[hwt 0]]: [./././.][B/B/./.] > >>> .... > >>> > >>> > >>> I think previous version's behavior would be close to what I expect. > >>> > >>> Tetusya > >>> > >>>> They have 4 cores/socket and 2 sockets, totally 4 X 2 = 8 cores, each. > >>>> > >>>> Here is the output of lstopo. > >>>> > >>>> mishima@manage round_robin]$ rsh node05 > >>>> Last login: Tue Feb 18 15:10:15 from manage > >>>> [mishima@node05 ~]$ lstopo > >>>> Machine (32GB) > >>>> NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (6144KB) > >>>> L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0 > >>>> (P#0) > >>>> L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU L#1 > >>>> (P#1) > >>>> L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU L#2 > >>>> (P#2) > >>>> L2 L#3 (512KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU L#3 > >>>> (P#3) > >>>> NUMANode L#1 (P#1 16GB) + Socket L#1 + L3 L#1 (6144KB) > >>>> L2 L#4 (512KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 + PU L#4 > >>>> (P#4) > >>>> L2 L#5 (512KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 + PU L#5 > >>>> (P#5) > >>>> L2 L#6 (512KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6 + PU L#6 > >>>> (P#6) > >>>> L2 L#7 (512KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7 + PU L#7 > >>>> (P#7) > >>>> .... > >>>> > >>>> I foucused on byobj_span and bynode. I didn't notice byobj was > > modified, > >>>> sorry. > >>>> > >>>> Tetsuya > >>>> > >>>>> Hmmm..what does your node look like again (sockets and cores)? > >>>>> > >>>>> On Feb 27, 2014, at 3:19 PM, tmish...@jcity.maeda.co.jp wrote: > >>>>> > >>>>>> > >>>>>> Hi Ralph, I'm afraid to say your new "map-by obj" causes another > >>>> problem. > >>>>>> > >>>>>> I have overload message with this command line as shown below: > >>>>>> > >>>>>> mpirun -np 8 -host node05,node06 -report-bindings -map-by > > socket:pe=2 > >>>>>> -display-map ~/mis/openmpi/d > >>>>>> emos/myprog > >>>>>> > >>>> > >>> > > -------------------------------------------------------------------------- > >>>>>> A request was made to bind to that would result in binding more > >>>>>> processes than cpus on a resource: > >>>>>> > >>>>>> Bind to: CORE > >>>>>> Node: node05 > >>>>>> #processes: 2 > >>>>>> #cpus: 1 > >>>>>> > >>>>>> You can override this protection by adding the "overload-allowed" > >>>>>> option to your binding directive. > >>>>>> > >>>> > >>> > > -------------------------------------------------------------------------- > >>>>>> > >>>>>> Then, I add "-bind-to core:overload-allowed" to see what happenes. > >>>>>> > >>>>>> mpirun -np 8 -host node05,node06 -report-bindings -map-by > > socket:pe=2 > >>>>>> -display-map -bind-to core:o > >>>>>> verload-allowed ~/mis/openmpi/demos/myprog > >>>>>> Data for JOB [14398,1] offset 0 > >>>>>> > >>>>>> ======================== JOB MAP ======================== > >>>>>> > >>>>>> Data for node: node05 Num slots: 1 Max slots: 0 Num procs: 4 > >>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 0 > >>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 1 > >>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 2 > >>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 3 > >>>>>> > >>>>>> Data for node: node06 Num slots: 1 Max slots: 0 Num procs: 4 > >>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 4 > >>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 5 > >>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 6 > >>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 7 > >>>>>> > >>>>>> ============================================================= > >>>>>> [node06.cluster:18443] MCW rank 6 bound to socket 0[core 0[hwt 0]], > >>>> socket > >>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > >>>>>> [node05.cluster:20901] MCW rank 2 bound to socket 0[core 0[hwt 0]], > >>>> socket > >>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > >>>>>> [node06.cluster:18443] MCW rank 7 bound to socket 0[core 2[hwt 0]], > >>>> socket > >>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] > >>>>>> [node05.cluster:20901] MCW rank 3 bound to socket 0[core 2[hwt 0]], > >>>> socket > >>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] > >>>>>> [node06.cluster:18443] MCW rank 4 bound to socket 0[core 0[hwt 0]], > >>>> socket > >>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > >>>>>> [node05.cluster:20901] MCW rank 0 bound to socket 0[core 0[hwt 0]], > >>>> socket > >>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > >>>>>> [node06.cluster:18443] MCW rank 5 bound to socket 0[core 2[hwt 0]], > >>>> socket > >>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] > >>>>>> [node05.cluster:20901] MCW rank 1 bound to socket 0[core 2[hwt 0]], > >>>> socket > >>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] > >>>>>> Hello world from process 4 of 8 > >>>>>> Hello world from process 2 of 8 > >>>>>> Hello world from process 6 of 8 > >>>>>> Hello world from process 0 of 8 > >>>>>> Hello world from process 5 of 8 > >>>>>> Hello world from process 1 of 8 > >>>>>> Hello world from process 7 of 8 > >>>>>> Hello world from process 3 of 8 > >>>>>> > >>>>>> When I add "map-by obj:span", it works fine: > >>>>>> > >>>>>> mpirun -np 8 -host node05,node06 -report-bindings -map-by > >>>> socket:pe=2,span > >>>>>> -display-map ~/mis/ope > >>>>>> nmpi/demos/myprog > >>>>>> Data for JOB [14703,1] offset 0 > >>>>>> > >>>>>> ======================== JOB MAP ======================== > >>>>>> > >>>>>> Data for node: node05 Num slots: 1 Max slots: 0 Num procs: 4 > >>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 0 > >>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 2 > >>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 1 > >>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 3 > >>>>>> > >>>>>> Data for node: node06 Num slots: 1 Max slots: 0 Num procs: 4 > >>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 4 > >>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 6 > >>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 5 > >>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 7 > >>>>>> > >>>>>> ============================================================= > >>>>>> [node06.cluster:18491] MCW rank 6 bound to socket 0[core 2[hwt 0]], > >>>> socket > >>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] > >>>>>> [node05.cluster:20949] MCW rank 2 bound to socket 0[core 2[hwt 0]], > >>>> socket > >>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] > >>>>>> [node06.cluster:18491] MCW rank 7 bound to socket 1[core 6[hwt 0]], > >>>> socket > >>>>>> 1[core 7[hwt 0]]: [./././.][././B/B] > >>>>>> [node05.cluster:20949] MCW rank 3 bound to socket 1[core 6[hwt 0]], > >>>> socket > >>>>>> 1[core 7[hwt 0]]: [./././.][././B/B] > >>>>>> [node06.cluster:18491] MCW rank 4 bound to socket 0[core 0[hwt 0]], > >>>> socket > >>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > >>>>>> [node05.cluster:20949] MCW rank 0 bound to socket 0[core 0[hwt 0]], > >>>> socket > >>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > >>>>>> [node06.cluster:18491] MCW rank 5 bound to socket 1[core 4[hwt 0]], > >>>> socket > >>>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.] > >>>>>> [node05.cluster:20949] MCW rank 1 bound to socket 1[core 4[hwt 0]], > >>>> socket > >>>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.] > >>>>>> .... > >>>>>> > >>>>>> So, byobj_span would be okay. Of course, bynode and byslot should be > >>>> okay. > >>>>>> Could you take a look at orte_rmaps_rr_byobj again? > >>>>>> > >>>>>> Regards, > >>>>>> Tetsuya Mishima > >>>>>> > >>>>>> _______________________________________________ > >>>>>> users mailing list > >>>>>> us...@open-mpi.org > >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>>> > >>>>> _______________________________________________ > >>>>> users mailing list > >>>>> us...@open-mpi.org > >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>> > >>>> _______________________________________________ > >>>> users mailing list > >>>> us...@open-mpi.org > >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users