Please take a look at https://svn.open-mpi.org/trac/ompi/ticket/4317
On Feb 27, 2014, at 8:13 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, I can't operate our cluster for a few days, sorry. > > But now, I'm narrowing down the cause by browsing the source code. > > My best guess is the line 529. The opal_hwloc_base_get_obj_by_type will > reset the object pointer to the first one when you move on to the next > node. > > 529 if (NULL == (obj = opal_hwloc_base_get_obj_by_type > (node->topology, target, cache_level, i, OPAL_HWLOC_AVAILABLE))) { > 530 ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND); > 531 return ORTE_ERR_NOT_FOUND; > 532 } > > if node->slots=1, then nprocs is set as nprocs=1 in the second pass: > > 495 nprocs = (node->slots - node->slots_inuse) / > orte_rmaps_base.cpus_per_rank; > 496 if (nprocs < 1) { > 497 if (second_pass) { > 498 /* already checked for oversubscription permission, > so at least put > 499 * one proc on it > 500 */ > 501 nprocs = 1; > > Therefore, opal_hwloc_base_get_obj_by_type is called one by one at each > node, which means > the object we get is always first one. > > It's not elegant but I guess you need dummy calls of > opal_hwloc_base_get_obj_by_type to > move the object pointer to the right place or modify > opal_hwloc_base_get_obj_by_type itself. > > Tetsuya > >> I'm having trouble seeing why it is failing, so I added some more debug > output. Could you run the failure case again with -mca rmaps_base_verbose > 10? >> >> Thanks >> Ralph >> >> On Feb 27, 2014, at 6:11 PM, tmish...@jcity.maeda.co.jp wrote: >> >>> >>> >>> Just checking the difference, not so significant meaning... >>> >>> Anyway, I guess it's due to the behavior when slot counts is missing >>> (regarded as slots=1) and it's oversubscribed unintentionally. >>> >>> I'm going out now, so I can't verify it quickly. If I provide the >>> correct slot counts, it wll work, I guess. How do you think? >>> >>> Tetsuya >>> >>>> "restore" in what sense? >>>> >>>> On Feb 27, 2014, at 4:10 PM, tmish...@jcity.maeda.co.jp wrote: >>>> >>>>> >>>>> >>>>> Hi Ralph, this is just for your information. >>>>> >>>>> I tried to restore previous orte_rmaps_rr_byobj. Then I gets the > result >>>>> below with this command line: >>>>> >>>>> mpirun -np 8 -host node05,node06 -report-bindings -map-by socket:pe=2 >>>>> -display-map -bind-to core:overload-allowed > ~/mis/openmpi/demos/myprog >>>>> Data for JOB [31184,1] offset 0 >>>>> >>>>> ======================== JOB MAP ======================== >>>>> >>>>> Data for node: node05 Num slots: 1 Max slots: 0 Num procs: 7 >>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 0 >>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 2 >>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 4 >>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 6 >>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 1 >>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 3 >>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 5 >>>>> >>>>> Data for node: node06 Num slots: 1 Max slots: 0 Num procs: 1 >>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 7 >>>>> >>>>> ============================================================= >>>>> [node06.cluster:18857] MCW rank 7 bound to socket 0[core 0[hwt 0]], >>> socket >>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] >>>>> [node05.cluster:21399] MCW rank 3 bound to socket 1[core 6[hwt 0]], >>> socket >>>>> 1[core 7[hwt 0]]: [./././.][././B/B] >>>>> [node05.cluster:21399] MCW rank 4 bound to socket 0[core 0[hwt 0]], >>> socket >>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] >>>>> [node05.cluster:21399] MCW rank 5 bound to socket 1[core 4[hwt 0]], >>> socket >>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.] >>>>> [node05.cluster:21399] MCW rank 6 bound to socket 0[core 2[hwt 0]], >>> socket >>>>> 0[core 3[hwt 0]]: [././B/B][./././.] >>>>> [node05.cluster:21399] MCW rank 0 bound to socket 0[core 0[hwt 0]], >>> socket >>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] >>>>> [node05.cluster:21399] MCW rank 1 bound to socket 1[core 4[hwt 0]], >>> socket >>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.] >>>>> [node05.cluster:21399] MCW rank 2 bound to socket 0[core 2[hwt 0]], >>> socket >>>>> 0[core 3[hwt 0]]: [././B/B][./././.] >>>>> .... >>>>> >>>>> >>>>> Then I add "-hostfile pbs_hosts" and the result is: >>>>> >>>>> [mishima@manage work]$cat pbs_hosts >>>>> node05 slots=8 >>>>> node06 slots=8 >>>>> [mishima@manage work]$ mpirun -np 8 -hostfile ~/work/pbs_hosts >>>>> -report-bindings -map-by socket:pe=2 -display-map >>>>> ~/mis/openmpi/demos/myprog >>>>> Data for JOB [30254,1] offset 0 >>>>> >>>>> ======================== JOB MAP ======================== >>>>> >>>>> Data for node: node05 Num slots: 8 Max slots: 0 Num procs: 4 >>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 0 >>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 2 >>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 1 >>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 3 >>>>> >>>>> Data for node: node06 Num slots: 8 Max slots: 0 Num procs: 4 >>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 4 >>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 6 >>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 5 >>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 7 >>>>> >>>>> ============================================================= >>>>> [node05.cluster:21501] MCW rank 2 bound to socket 0[core 2[hwt 0]], >>> socket >>>>> 0[core 3[hwt 0]]: [././B/B][./././.] >>>>> [node05.cluster:21501] MCW rank 3 bound to socket 1[core 6[hwt 0]], >>> socket >>>>> 1[core 7[hwt 0]]: [./././.][././B/B] >>>>> [node05.cluster:21501] MCW rank 0 bound to socket 0[core 0[hwt 0]], >>> socket >>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] >>>>> [node05.cluster:21501] MCW rank 1 bound to socket 1[core 4[hwt 0]], >>> socket >>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.] >>>>> [node06.cluster:18935] MCW rank 6 bound to socket 0[core 2[hwt 0]], >>> socket >>>>> 0[core 3[hwt 0]]: [././B/B][./././.] >>>>> [node06.cluster:18935] MCW rank 7 bound to socket 1[core 6[hwt 0]], >>> socket >>>>> 1[core 7[hwt 0]]: [./././.][././B/B] >>>>> [node06.cluster:18935] MCW rank 4 bound to socket 0[core 0[hwt 0]], >>> socket >>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] >>>>> [node06.cluster:18935] MCW rank 5 bound to socket 1[core 4[hwt 0]], >>> socket >>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.] >>>>> .... >>>>> >>>>> >>>>> I think previous version's behavior would be close to what I expect. >>>>> >>>>> Tetusya >>>>> >>>>>> They have 4 cores/socket and 2 sockets, totally 4 X 2 = 8 cores, > each. >>>>>> >>>>>> Here is the output of lstopo. >>>>>> >>>>>> mishima@manage round_robin]$ rsh node05 >>>>>> Last login: Tue Feb 18 15:10:15 from manage >>>>>> [mishima@node05 ~]$ lstopo >>>>>> Machine (32GB) >>>>>> NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (6144KB) >>>>>> L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0 >>>>>> (P#0) >>>>>> L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU L#1 >>>>>> (P#1) >>>>>> L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU L#2 >>>>>> (P#2) >>>>>> L2 L#3 (512KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU L#3 >>>>>> (P#3) >>>>>> NUMANode L#1 (P#1 16GB) + Socket L#1 + L3 L#1 (6144KB) >>>>>> L2 L#4 (512KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 + PU L#4 >>>>>> (P#4) >>>>>> L2 L#5 (512KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 + PU L#5 >>>>>> (P#5) >>>>>> L2 L#6 (512KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6 + PU L#6 >>>>>> (P#6) >>>>>> L2 L#7 (512KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7 + PU L#7 >>>>>> (P#7) >>>>>> .... >>>>>> >>>>>> I foucused on byobj_span and bynode. I didn't notice byobj was >>> modified, >>>>>> sorry. >>>>>> >>>>>> Tetsuya >>>>>> >>>>>>> Hmmm..what does your node look like again (sockets and cores)? >>>>>>> >>>>>>> On Feb 27, 2014, at 3:19 PM, tmish...@jcity.maeda.co.jp wrote: >>>>>>> >>>>>>>> >>>>>>>> Hi Ralph, I'm afraid to say your new "map-by obj" causes another >>>>>> problem. >>>>>>>> >>>>>>>> I have overload message with this command line as shown below: >>>>>>>> >>>>>>>> mpirun -np 8 -host node05,node06 -report-bindings -map-by >>> socket:pe=2 >>>>>>>> -display-map ~/mis/openmpi/d >>>>>>>> emos/myprog >>>>>>>> >>>>>> >>>>> >>> > -------------------------------------------------------------------------- >>>>>>>> A request was made to bind to that would result in binding more >>>>>>>> processes than cpus on a resource: >>>>>>>> >>>>>>>> Bind to: CORE >>>>>>>> Node: node05 >>>>>>>> #processes: 2 >>>>>>>> #cpus: 1 >>>>>>>> >>>>>>>> You can override this protection by adding the "overload-allowed" >>>>>>>> option to your binding directive. >>>>>>>> >>>>>> >>>>> >>> > -------------------------------------------------------------------------- >>>>>>>> >>>>>>>> Then, I add "-bind-to core:overload-allowed" to see what happenes. >>>>>>>> >>>>>>>> mpirun -np 8 -host node05,node06 -report-bindings -map-by >>> socket:pe=2 >>>>>>>> -display-map -bind-to core:o >>>>>>>> verload-allowed ~/mis/openmpi/demos/myprog >>>>>>>> Data for JOB [14398,1] offset 0 >>>>>>>> >>>>>>>> ======================== JOB MAP ======================== >>>>>>>> >>>>>>>> Data for node: node05 Num slots: 1 Max slots: 0 Num procs: > 4 >>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 0 >>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 1 >>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 2 >>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 3 >>>>>>>> >>>>>>>> Data for node: node06 Num slots: 1 Max slots: 0 Num procs: > 4 >>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 4 >>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 5 >>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 6 >>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 7 >>>>>>>> >>>>>>>> ============================================================= >>>>>>>> [node06.cluster:18443] MCW rank 6 bound to socket 0[core 0[hwt > 0]], >>>>>> socket >>>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] >>>>>>>> [node05.cluster:20901] MCW rank 2 bound to socket 0[core 0[hwt > 0]], >>>>>> socket >>>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] >>>>>>>> [node06.cluster:18443] MCW rank 7 bound to socket 0[core 2[hwt > 0]], >>>>>> socket >>>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] >>>>>>>> [node05.cluster:20901] MCW rank 3 bound to socket 0[core 2[hwt > 0]], >>>>>> socket >>>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] >>>>>>>> [node06.cluster:18443] MCW rank 4 bound to socket 0[core 0[hwt > 0]], >>>>>> socket >>>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] >>>>>>>> [node05.cluster:20901] MCW rank 0 bound to socket 0[core 0[hwt > 0]], >>>>>> socket >>>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] >>>>>>>> [node06.cluster:18443] MCW rank 5 bound to socket 0[core 2[hwt > 0]], >>>>>> socket >>>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] >>>>>>>> [node05.cluster:20901] MCW rank 1 bound to socket 0[core 2[hwt > 0]], >>>>>> socket >>>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] >>>>>>>> Hello world from process 4 of 8 >>>>>>>> Hello world from process 2 of 8 >>>>>>>> Hello world from process 6 of 8 >>>>>>>> Hello world from process 0 of 8 >>>>>>>> Hello world from process 5 of 8 >>>>>>>> Hello world from process 1 of 8 >>>>>>>> Hello world from process 7 of 8 >>>>>>>> Hello world from process 3 of 8 >>>>>>>> >>>>>>>> When I add "map-by obj:span", it works fine: >>>>>>>> >>>>>>>> mpirun -np 8 -host node05,node06 -report-bindings -map-by >>>>>> socket:pe=2,span >>>>>>>> -display-map ~/mis/ope >>>>>>>> nmpi/demos/myprog >>>>>>>> Data for JOB [14703,1] offset 0 >>>>>>>> >>>>>>>> ======================== JOB MAP ======================== >>>>>>>> >>>>>>>> Data for node: node05 Num slots: 1 Max slots: 0 Num procs: > 4 >>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 0 >>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 2 >>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 1 >>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 3 >>>>>>>> >>>>>>>> Data for node: node06 Num slots: 1 Max slots: 0 Num procs: > 4 >>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 4 >>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 6 >>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 5 >>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 7 >>>>>>>> >>>>>>>> ============================================================= >>>>>>>> [node06.cluster:18491] MCW rank 6 bound to socket 0[core 2[hwt > 0]], >>>>>> socket >>>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] >>>>>>>> [node05.cluster:20949] MCW rank 2 bound to socket 0[core 2[hwt > 0]], >>>>>> socket >>>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.] >>>>>>>> [node06.cluster:18491] MCW rank 7 bound to socket 1[core 6[hwt > 0]], >>>>>> socket >>>>>>>> 1[core 7[hwt 0]]: [./././.][././B/B] >>>>>>>> [node05.cluster:20949] MCW rank 3 bound to socket 1[core 6[hwt > 0]], >>>>>> socket >>>>>>>> 1[core 7[hwt 0]]: [./././.][././B/B] >>>>>>>> [node06.cluster:18491] MCW rank 4 bound to socket 0[core 0[hwt > 0]], >>>>>> socket >>>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] >>>>>>>> [node05.cluster:20949] MCW rank 0 bound to socket 0[core 0[hwt > 0]], >>>>>> socket >>>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.] >>>>>>>> [node06.cluster:18491] MCW rank 5 bound to socket 1[core 4[hwt > 0]], >>>>>> socket >>>>>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.] >>>>>>>> [node05.cluster:20949] MCW rank 1 bound to socket 1[core 4[hwt > 0]], >>>>>> socket >>>>>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.] >>>>>>>> .... >>>>>>>> >>>>>>>> So, byobj_span would be okay. Of course, bynode and byslot should > be >>>>>> okay. >>>>>>>> Could you take a look at orte_rmaps_rr_byobj again? >>>>>>>> >>>>>>>> Regards, >>>>>>>> Tetsuya Mishima >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users