Try replacing --report-bindings with -mca hwloc_base_report_bindings 1 and see 
if that works


On Aug 7, 2014, at 4:04 AM, Siegmar Gross 
<siegmar.gr...@informatik.hs-fulda.de> wrote:

> Hi,
> 
>> I can't replicate - this worked fine for me. I'm at a loss as
>> to how you got that error as it would require some strange
>> error in the report-bindngs option. If you remove that option
>> from your cmd line, does the problem go away?
> 
> Yes.
> 
> tyr openmpi_1.7.x_or_newer 468 mpiexec -np 4 -rf rf_linpc_sunpc_tyr hostname
> tyr.informatik.hs-fulda.de
> linpc0
> linpc1
> sunpc1
> 
> 
> tyr openmpi_1.7.x_or_newer 469 mpiexec -report-bindings -np 4 -rf 
> rf_linpc_sunpc_tyr hostname
> --------------------------------------------------------------------------
> An invalid value was supplied for an enum variable.
> 
>  Variable     : hwloc_base_report_bindings
>  Value        : 1,1
>  Valid values : 0: f|false|disabled, 1: t|true|enabled
> --------------------------------------------------------------------------
> tyr.informatik.hs-fulda.de
> [tyr.informatik.hs-fulda.de:29900] MCW rank 3 bound to socket 1[core 1[hwt 
> 0]]: 
> [.][B]
> [linpc0:04217] MCW rank 0 is not bound (or bound to all available processors)
> [linpc1:23107] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket 0[core 
> 1[hwt 
> 0]]: [B/B][./.]
> linpc0
> linpc1
> sunpc1
> tyr openmpi_1.7.x_or_newer 470 
> 
> 
> 
> Kind regards
> 
> Siegmar
> 
> 
> 
> 
>> On Aug 5, 2014, at 12:56 AM, Siegmar Gross 
> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>> 
>>> Hi,
>>> 
>>> yesterday I installed openmpi-1.8.2rc3 on my machines
>>> (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE
>>> Linux 12.1 x86_64) with Sun C 5.12. I get an error,
>>> if I use a rankfile for all three architectures.
>>> The error message depends on the local machine, which
>>> I use to run "mpiexec". I get a different error, if I
>>> use two "Sparc64 VII" machines (see below).
>>> 
>>> tyr openmpi_1.7.x_or_newer 109 cat rf_linpc_sunpc_tyr
>>> rank 0=linpc0 slot=0:0-1;1:0-1
>>> rank 1=linpc1 slot=0:0-1
>>> rank 2=sunpc1 slot=1:0
>>> rank 3=tyr slot=1:0
>>> tyr openmpi_1.7.x_or_newer 110 
>>> 
>>> 
>>> I get the following message, if I run "mpiexec" on
>>> Solaris 10 Sparc.
>>> 
>>> tyr openmpi_1.7.x_or_newer 110 mpiexec -report-bindings -np 4 -rf 
>>> rf_linpc_sunpc_tyr hostname
>>> --------------------------------------------------------------------------
>>> An invalid value was supplied for an enum variable.
>>> 
>>> Variable     : hwloc_base_report_bindings
>>> Value        : 1,1
>>> Valid values : 0: f|false|disabled, 1: t|true|enabled
>>> --------------------------------------------------------------------------
>>> [tyr.informatik.hs-fulda.de:26960] MCW rank 3 bound to socket 1[core 1[hwt 
> 0]]: 
>>> [.][B]
>>> tyr.informatik.hs-fulda.de
>>> [linpc1:12109] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket 0[core 
> 1[hwt 
>>> 0]]: [B/B][./.]
>>> [linpc0:26642] MCW rank 0 is not bound (or bound to all available 
> processors)
>>> linpc1
>>> linpc0
>>> sunpc1
>>> tyr openmpi_1.7.x_or_newer 111 
>>> 
>>> 
>>> 
>>> I get the following message, if I run "mpiexec" on
>>> Solaris 10 x86_64 or Linux x86_64.
>>> 
>>> sunpc1 openmpi_1.7.x_or_newer 109 mpiexec -report-bindings -np 4 -rf 
>>> rf_linpc_sunpc_tyr hostname
>>> --------------------------------------------------------------------------
>>> An invalid value was supplied for an enum variable.
>>> 
>>> Variable     : hwloc_base_report_bindings
>>> Value        : 1,1
>>> Valid values : 0: f|false|disabled, 1: t|true|enabled
>>> --------------------------------------------------------------------------
>>> [sunpc1:02931] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.]
>>> sunpc1
>>> [linpc0:26850] MCW rank 0 is not bound (or bound to all available 
> processors)
>>> [linpc1:12386] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket 0[core 
> 1[hwt 
>>> 0]]: [B/B][./.]
>>> linpc0
>>> linpc1
>>> --------------------------------------------------------------------------
>>> Open MPI tried to bind a new process, but something went wrong.  The
>>> process was killed without launching the target application.  Your job
>>> will now abort.
>>> 
>>> Local host:        tyr
>>> Application name:  /usr/local/bin/hostname
>>> Error message:     hwloc_set_cpubind returned "Error" for bitmap "2"
>>> Location:          
>>> 
> ../../../../../openmpi-1.8.2rc3/orte/mca/odls/default/odls_default_module.c:551
>>> --------------------------------------------------------------------------
>>> sunpc1 openmpi_1.7.x_or_newer 110 
>>> 
>>> 
>>> 
>>> 
>>> The rankfile worked for older versions of Open MPI.
>>> 
>>> tyr openmpi_1.7.x_or_newer 139 ompi_info | grep MPI:
>>>               Open MPI: 1.8.2a1r31804
>>> tyr openmpi_1.7.x_or_newer 140 mpiexec -report-bindings -np 4 -rf 
>>> rf_linpc_sunpc_tyr hostname
>>> [tyr.informatik.hs-fulda.de:27171] MCW rank 3 bound to socket 1[core 1[hwt 
> 0]]: 
>>> [.][B]
>>> tyr.informatik.hs-fulda.de
>>> [linpc1:12790] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket 0[core 
> 1[hwt 
>>> 0]]: [B/B][./.]
>>> [linpc0:27221] MCW rank 0 is not bound (or bound to all available 
> processors)
>>> linpc1
>>> linpc0
>>> [sunpc1:03046] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.]
>>> sunpc1
>>> tyr openmpi_1.7.x_or_newer 141 
>>> 
>>> 
>>> 
>>> 
>>> I get the following error, if I use two Sparc machines
>>> (Sun M4000 servers with two quad core Sparc64 VII processors
>>> and two hardware threads per core). I'm not sure if this
>>> worked before or if I have to use different options to make
>>> it working.
>>> 
>>> tyr openmpi_1.7.x_or_newer 151 cat rf_rs0_rs1
>>> rank 0=rs0 slot=0:0-7
>>> rank 1=rs0 slot=1
>>> rank 2=rs1 slot=0
>>> rank 3=rs1 slot=1
>>> tyr openmpi_1.7.x_or_newer 152 
>>> 
>>> rs0 openmpi_1.7.x_or_newer 104 mpiexec --report-bindings 
>>> --use-hwthread-cpus 
> -np 
>>> 4 -rf rf_rs0_rs1 hostname
>>> [rs0.informatik.hs-fulda.de:26085] [[28578,0],0] ORTE_ERROR_LOG: Not found 
> in 
>>> file 
> ../../../../../openmpi-1.8.2rc3/orte/mca/rmaps/rank_file/rmaps_rank_file.c 
>>> at line 279
>>> [rs0.informatik.hs-fulda.de:26085] [[28578,0],0] ORTE_ERROR_LOG: Not found 
> in 
>>> file ../../../../openmpi-1.8.2rc3/orte/mca/rmaps/base/rmaps_base_map_job.c 
> at 
>>> line 285
>>> rs0 openmpi_1.7.x_or_newer 105 
>>> 
>>> 
>>> It works for the following command.
>>> 
>>> rs0 openmpi_1.7.x_or_newer 107 mpiexec --report-bindings -np 4 --host 
> rs0,rs1 
>>> --bind-to hwthread hostname
>>> [rs0.informatik.hs-fulda.de:26102] MCW rank 0 bound to socket 0[core 0[hwt 
> 0]]: 
>>> [B./../../..][../../../..]
>>> [rs0.informatik.hs-fulda.de:26102] MCW rank 1 bound to socket 1[core 4[hwt 
> 0]]: 
>>> [../../../..][B./../../..]
>>> rs0.informatik.hs-fulda.de
>>> rs0.informatik.hs-fulda.de
>>> rs1.informatik.hs-fulda.de
>>> [rs1.informatik.hs-fulda.de:28740] MCW rank 2 bound to socket 0[core 0[hwt 
> 0]]: 
>>> [B./../../..][../../../..]
>>> [rs1.informatik.hs-fulda.de:28740] MCW rank 3 bound to socket 1[core 4[hwt 
> 0]]: 
>>> [../../../..][B./../../..]
>>> rs1.informatik.hs-fulda.de
>>> rs0 openmpi_1.7.x_or_newer 108 
>>> 
>>> 
>>> I would be grateful if somebody could fix the problem. Please let
>>> me know if I can provide anything else. Thank you very much for
>>> any help in advance.
>>> 
>>> 
>>> Kind regards
>>> 
>>> Siegmar
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/24907.php
>> 
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/24936.php

Reply via email to