Try replacing --report-bindings with -mca hwloc_base_report_bindings 1 and see if that works
On Aug 7, 2014, at 4:04 AM, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote: > Hi, > >> I can't replicate - this worked fine for me. I'm at a loss as >> to how you got that error as it would require some strange >> error in the report-bindngs option. If you remove that option >> from your cmd line, does the problem go away? > > Yes. > > tyr openmpi_1.7.x_or_newer 468 mpiexec -np 4 -rf rf_linpc_sunpc_tyr hostname > tyr.informatik.hs-fulda.de > linpc0 > linpc1 > sunpc1 > > > tyr openmpi_1.7.x_or_newer 469 mpiexec -report-bindings -np 4 -rf > rf_linpc_sunpc_tyr hostname > -------------------------------------------------------------------------- > An invalid value was supplied for an enum variable. > > Variable : hwloc_base_report_bindings > Value : 1,1 > Valid values : 0: f|false|disabled, 1: t|true|enabled > -------------------------------------------------------------------------- > tyr.informatik.hs-fulda.de > [tyr.informatik.hs-fulda.de:29900] MCW rank 3 bound to socket 1[core 1[hwt > 0]]: > [.][B] > [linpc0:04217] MCW rank 0 is not bound (or bound to all available processors) > [linpc1:23107] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket 0[core > 1[hwt > 0]]: [B/B][./.] > linpc0 > linpc1 > sunpc1 > tyr openmpi_1.7.x_or_newer 470 > > > > Kind regards > > Siegmar > > > > >> On Aug 5, 2014, at 12:56 AM, Siegmar Gross > <siegmar.gr...@informatik.hs-fulda.de> wrote: >> >>> Hi, >>> >>> yesterday I installed openmpi-1.8.2rc3 on my machines >>> (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE >>> Linux 12.1 x86_64) with Sun C 5.12. I get an error, >>> if I use a rankfile for all three architectures. >>> The error message depends on the local machine, which >>> I use to run "mpiexec". I get a different error, if I >>> use two "Sparc64 VII" machines (see below). >>> >>> tyr openmpi_1.7.x_or_newer 109 cat rf_linpc_sunpc_tyr >>> rank 0=linpc0 slot=0:0-1;1:0-1 >>> rank 1=linpc1 slot=0:0-1 >>> rank 2=sunpc1 slot=1:0 >>> rank 3=tyr slot=1:0 >>> tyr openmpi_1.7.x_or_newer 110 >>> >>> >>> I get the following message, if I run "mpiexec" on >>> Solaris 10 Sparc. >>> >>> tyr openmpi_1.7.x_or_newer 110 mpiexec -report-bindings -np 4 -rf >>> rf_linpc_sunpc_tyr hostname >>> -------------------------------------------------------------------------- >>> An invalid value was supplied for an enum variable. >>> >>> Variable : hwloc_base_report_bindings >>> Value : 1,1 >>> Valid values : 0: f|false|disabled, 1: t|true|enabled >>> -------------------------------------------------------------------------- >>> [tyr.informatik.hs-fulda.de:26960] MCW rank 3 bound to socket 1[core 1[hwt > 0]]: >>> [.][B] >>> tyr.informatik.hs-fulda.de >>> [linpc1:12109] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket 0[core > 1[hwt >>> 0]]: [B/B][./.] >>> [linpc0:26642] MCW rank 0 is not bound (or bound to all available > processors) >>> linpc1 >>> linpc0 >>> sunpc1 >>> tyr openmpi_1.7.x_or_newer 111 >>> >>> >>> >>> I get the following message, if I run "mpiexec" on >>> Solaris 10 x86_64 or Linux x86_64. >>> >>> sunpc1 openmpi_1.7.x_or_newer 109 mpiexec -report-bindings -np 4 -rf >>> rf_linpc_sunpc_tyr hostname >>> -------------------------------------------------------------------------- >>> An invalid value was supplied for an enum variable. >>> >>> Variable : hwloc_base_report_bindings >>> Value : 1,1 >>> Valid values : 0: f|false|disabled, 1: t|true|enabled >>> -------------------------------------------------------------------------- >>> [sunpc1:02931] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.] >>> sunpc1 >>> [linpc0:26850] MCW rank 0 is not bound (or bound to all available > processors) >>> [linpc1:12386] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket 0[core > 1[hwt >>> 0]]: [B/B][./.] >>> linpc0 >>> linpc1 >>> -------------------------------------------------------------------------- >>> Open MPI tried to bind a new process, but something went wrong. The >>> process was killed without launching the target application. Your job >>> will now abort. >>> >>> Local host: tyr >>> Application name: /usr/local/bin/hostname >>> Error message: hwloc_set_cpubind returned "Error" for bitmap "2" >>> Location: >>> > ../../../../../openmpi-1.8.2rc3/orte/mca/odls/default/odls_default_module.c:551 >>> -------------------------------------------------------------------------- >>> sunpc1 openmpi_1.7.x_or_newer 110 >>> >>> >>> >>> >>> The rankfile worked for older versions of Open MPI. >>> >>> tyr openmpi_1.7.x_or_newer 139 ompi_info | grep MPI: >>> Open MPI: 1.8.2a1r31804 >>> tyr openmpi_1.7.x_or_newer 140 mpiexec -report-bindings -np 4 -rf >>> rf_linpc_sunpc_tyr hostname >>> [tyr.informatik.hs-fulda.de:27171] MCW rank 3 bound to socket 1[core 1[hwt > 0]]: >>> [.][B] >>> tyr.informatik.hs-fulda.de >>> [linpc1:12790] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket 0[core > 1[hwt >>> 0]]: [B/B][./.] >>> [linpc0:27221] MCW rank 0 is not bound (or bound to all available > processors) >>> linpc1 >>> linpc0 >>> [sunpc1:03046] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.] >>> sunpc1 >>> tyr openmpi_1.7.x_or_newer 141 >>> >>> >>> >>> >>> I get the following error, if I use two Sparc machines >>> (Sun M4000 servers with two quad core Sparc64 VII processors >>> and two hardware threads per core). I'm not sure if this >>> worked before or if I have to use different options to make >>> it working. >>> >>> tyr openmpi_1.7.x_or_newer 151 cat rf_rs0_rs1 >>> rank 0=rs0 slot=0:0-7 >>> rank 1=rs0 slot=1 >>> rank 2=rs1 slot=0 >>> rank 3=rs1 slot=1 >>> tyr openmpi_1.7.x_or_newer 152 >>> >>> rs0 openmpi_1.7.x_or_newer 104 mpiexec --report-bindings >>> --use-hwthread-cpus > -np >>> 4 -rf rf_rs0_rs1 hostname >>> [rs0.informatik.hs-fulda.de:26085] [[28578,0],0] ORTE_ERROR_LOG: Not found > in >>> file > ../../../../../openmpi-1.8.2rc3/orte/mca/rmaps/rank_file/rmaps_rank_file.c >>> at line 279 >>> [rs0.informatik.hs-fulda.de:26085] [[28578,0],0] ORTE_ERROR_LOG: Not found > in >>> file ../../../../openmpi-1.8.2rc3/orte/mca/rmaps/base/rmaps_base_map_job.c > at >>> line 285 >>> rs0 openmpi_1.7.x_or_newer 105 >>> >>> >>> It works for the following command. >>> >>> rs0 openmpi_1.7.x_or_newer 107 mpiexec --report-bindings -np 4 --host > rs0,rs1 >>> --bind-to hwthread hostname >>> [rs0.informatik.hs-fulda.de:26102] MCW rank 0 bound to socket 0[core 0[hwt > 0]]: >>> [B./../../..][../../../..] >>> [rs0.informatik.hs-fulda.de:26102] MCW rank 1 bound to socket 1[core 4[hwt > 0]]: >>> [../../../..][B./../../..] >>> rs0.informatik.hs-fulda.de >>> rs0.informatik.hs-fulda.de >>> rs1.informatik.hs-fulda.de >>> [rs1.informatik.hs-fulda.de:28740] MCW rank 2 bound to socket 0[core 0[hwt > 0]]: >>> [B./../../..][../../../..] >>> [rs1.informatik.hs-fulda.de:28740] MCW rank 3 bound to socket 1[core 4[hwt > 0]]: >>> [../../../..][B./../../..] >>> rs1.informatik.hs-fulda.de >>> rs0 openmpi_1.7.x_or_newer 108 >>> >>> >>> I would be grateful if somebody could fix the problem. Please let >>> me know if I can provide anything else. Thank you very much for >>> any help in advance. >>> >>> >>> Kind regards >>> >>> Siegmar >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: > http://www.open-mpi.org/community/lists/users/2014/08/24907.php >> >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/08/24936.php